主题
Speech API
Web Speech API 允许你在 Web 应用中集成语音数据。它包含两个主要部分:
- Speech Recognition (语音识别): 将语音转换为文本
- Speech Synthesis (语音合成): 将文本转换为语音
Speech Recognition (语音识别)
基本概念
语音识别通过 SpeechRecognition 接口实现,允许将语音输入识别为文本。
核心接口
SpeechRecognition: 语音识别的控制器接口SpeechRecognitionEvent: 识别结果的事件对象SpeechRecognitionErrorEvent: 错误事件对象SpeechRecognitionAlternative: 单个识别结果SpeechRecognitionResult: 单次识别的所有可能结果
基本用法
javascript
// 创建识别器实例 (注意浏览器前缀)
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = new SpeechRecognition()
// 配置选项
recognition.continuous = true // 是否持续识别
recognition.interimResults = true // 是否返回临时结果
recognition.lang = 'zh-CN' // 识别语言
recognition.maxAlternatives = 1 // 每次识别返回的最大结果数
// 开始识别
recognition.start()
// 监听结果
recognition.onresult = (event) => {
const results = event.results
const lastResult = results[results.length - 1]
if (lastResult.isFinal) {
const transcript = lastResult[0].transcript
console.log('最终结果:', transcript)
} else {
const transcript = lastResult[0].transcript
console.log('临时结果:', transcript)
}
}
// 监听错误
recognition.onerror = (event) => {
console.error('识别错误:', event.error)
}
// 监听结束
recognition.onend = () => {
console.log('识别已结束')
}
// 停止识别
// recognition.stop();
// 中止识别
// recognition.abort();重要属性
| 属性 | 类型 | 说明 |
|---|---|---|
lang | String | 识别语言 (如 'zh-CN', 'en-US') |
continuous | Boolean | 是否持续识别,默认 false |
interimResults | Boolean | 是否返回临时结果,默认 false |
maxAlternatives | Number | 每次识别返回的最大备选结果数 |
grammars | SpeechGrammarList | 语法列表,用于提高特定词汇识别率 |
常用事件
| 事件 | 说明 |
|---|---|
start | 识别服务开始 |
end | 识别服务结束 |
result | 返回识别结果 |
error | 发生错误 |
audiostart | 用户代理开始捕获音频 |
audioend | 用户代理停止捕获音频 |
soundstart | 检测到声音 |
soundend | 声音停止 |
speechstart | 检测到语音 |
speechend | 语音停止 |
nomatch | 识别服务返回无匹配结果 |
错误类型
javascript
recognition.onerror = (event) => {
switch (event.error) {
case 'no-speech':
console.log('未检测到语音')
break
case 'aborted':
console.log('识别被中止')
break
case 'audio-capture':
console.log('音频捕获失败')
break
case 'network':
console.log('网络错误')
break
case 'not-allowed':
console.log('未授予麦克风权限')
break
case 'service-not-allowed':
console.log('服务不可用')
break
}
}完整示例
javascript
class VoiceRecognition {
constructor() {
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
if (!SpeechRecognition) {
throw new Error('浏览器不支持语音识别')
}
this.recognition = new SpeechRecognition()
this.isRecognizing = false
this.setupRecognition()
}
setupRecognition() {
this.recognition.continuous = true
this.recognition.interimResults = true
this.recognition.lang = 'zh-CN'
this.recognition.onstart = () => {
this.isRecognizing = true
console.log('开始识别')
}
this.recognition.onresult = (event) => {
let interimTranscript = ''
let finalTranscript = ''
for (let i = event.resultIndex; i < event.results.length; i++) {
const transcript = event.results[i][0].transcript
if (event.results[i].isFinal) {
finalTranscript += transcript
} else {
interimTranscript += transcript
}
}
this.onResult(finalTranscript, interimTranscript)
}
this.recognition.onerror = (event) => {
this.onError(event.error)
}
this.recognition.onend = () => {
this.isRecognizing = false
console.log('识别结束')
}
}
start() {
if (!this.isRecognizing) {
this.recognition.start()
}
}
stop() {
if (this.isRecognizing) {
this.recognition.stop()
}
}
onResult(final, interim) {
// 由子类实现或外部传入
console.log('最终结果:', final)
console.log('临时结果:', interim)
}
onError(error) {
// 由子类实现或外部传入
console.error('错误:', error)
}
}
// 使用
const voiceRec = new VoiceRecognition()
voiceRec.onResult = (final, interim) => {
if (final) {
document.getElementById('final').textContent = final
}
document.getElementById('interim').textContent = interim
}
voiceRec.start()Speech Synthesis (语音合成)
基本概念
语音合成通过 SpeechSynthesis 接口实现,允许将文本转换为语音输出。
核心接口
SpeechSynthesis: 语音合成的控制器接口SpeechSynthesisUtterance: 语音请求对象SpeechSynthesisVoice: 语音(音色)对象SpeechSynthesisEvent: 语音事件对象
基本用法
javascript
// 创建语音实例
const utterance = new SpeechSynthesisUtterance('你好,世界')
// 配置选项
utterance.lang = 'zh-CN' // 语言
utterance.pitch = 1 // 音调 (0-2)
utterance.rate = 1 // 语速 (0.1-10)
utterance.volume = 1 // 音量 (0-1)
// 选择音色
const voices = speechSynthesis.getVoices()
utterance.voice = voices.find((v) => v.lang === 'zh-CN')
// 监听事件
utterance.onstart = () => console.log('开始播放')
utterance.onend = () => console.log('播放结束')
utterance.onerror = (e) => console.error('播放错误', e)
// 播放
speechSynthesis.speak(utterance)
// 控制方法
// speechSynthesis.pause(); // 暂停
// speechSynthesis.resume(); // 继续
// speechSynthesis.cancel(); // 取消SpeechSynthesisUtterance 属性
| 属性 | 类型 | 默认值 | 说明 |
|---|---|---|---|
text | String | '' | 要合成的文本内容 |
lang | String | document.documentElement.lang | 语言 |
voice | SpeechSynthesisVoice | null | 使用的音色 |
volume | Number | 1 | 音量 (0-1) |
rate | Number | 1 | 语速 (0.1-10) |
pitch | Number | 1 | 音调 (0-2) |
SpeechSynthesis 方法
| 方法 | 说明 |
|---|---|
speak(utterance) | 将语音请求添加到队列 |
cancel() | 移除队列中所有语音请求 |
pause() | 暂停播放 |
resume() | 继续播放 |
getVoices() | 获取可用的音色列表 |
SpeechSynthesis 属性
| 属性 | 类型 | 说明 |
|---|---|---|
paused | Boolean | 是否暂停 |
pending | Boolean | 队列中是否有待播放内容 |
speaking | Boolean | 是否正在播放 |
事件
| 事件 | 说明 |
|---|---|
start | 开始播放 |
end | 播放结束 |
pause | 暂停播放 |
resume | 继续播放 |
error | 发生错误 |
boundary | 到达词或句子边界 |
mark | 到达 SSML 标记 |
获取可用音色
javascript
// 注意:音色列表可能异步加载
let voices = []
function loadVoices() {
voices = speechSynthesis.getVoices()
// 按语言分组
const voicesByLang = voices.reduce((acc, voice) => {
const lang = voice.lang.split('-')[0]
if (!acc[lang]) acc[lang] = []
acc[lang].push(voice)
return acc
}, {})
console.log('可用音色:', voicesByLang)
}
// 首次加载
loadVoices()
// 监听音色变化(某些浏览器需要)
speechSynthesis.onvoiceschanged = loadVoices完整示例
javascript
class TextToSpeech {
constructor() {
this.synth = window.speechSynthesis
this.voices = []
this.loadVoices()
}
loadVoices() {
this.voices = this.synth.getVoices()
if (this.voices.length === 0) {
this.synth.onvoiceschanged = () => {
this.voices = this.synth.getVoices()
}
}
}
speak(text, options = {}) {
// 取消当前播放
this.synth.cancel()
const utterance = new SpeechSynthesisUtterance(text)
// 配置选项
utterance.lang = options.lang || 'zh-CN'
utterance.pitch = options.pitch || 1
utterance.rate = options.rate || 1
utterance.volume = options.volume || 1
// 选择音色
if (options.voiceName) {
utterance.voice = this.voices.find((v) => v.name === options.voiceName)
} else {
utterance.voice = this.voices.find((v) => v.lang === utterance.lang)
}
// 事件处理
utterance.onstart = () => {
console.log('开始播放')
options.onStart?.()
}
utterance.onend = () => {
console.log('播放结束')
options.onEnd?.()
}
utterance.onerror = (event) => {
console.error('播放错误:', event)
options.onError?.(event)
}
utterance.onpause = () => {
console.log('播放暂停')
}
utterance.onresume = () => {
console.log('继续播放')
}
// 播放
this.synth.speak(utterance)
}
pause() {
this.synth.pause()
}
resume() {
this.synth.resume()
}
cancel() {
this.synth.cancel()
}
getVoices(lang) {
if (lang) {
return this.voices.filter((v) => v.lang.startsWith(lang))
}
return this.voices
}
}
// 使用
const tts = new TextToSpeech()
tts.speak('你好,世界', {
lang: 'zh-CN',
rate: 1.2,
pitch: 1,
onStart: () => console.log('开始'),
onEnd: () => console.log('结束'),
})浏览器兼容性
Speech Recognition
- ✅ Chrome/Edge (需要 webkit 前缀)
- ✅ Safari (需要 webkit 前缀)
- ❌ Firefox (部分支持,需开启实验特性)
- ❌ Opera
Speech Synthesis
- ✅ Chrome/Edge
- ✅ Safari
- ✅ Firefox
- ✅ Opera
兼容性检测
javascript
// 检测语音识别支持
const supportsSpeechRecognition = 'SpeechRecognition' in window || 'webkitSpeechRecognition' in window
// 检测语音合成支持
const supportsSpeechSynthesis = 'speechSynthesis' in window
if (!supportsSpeechRecognition) {
console.warn('浏览器不支持语音识别')
}
if (!supportsSpeechSynthesis) {
console.warn('浏览器不支持语音合成')
}实际应用场景
1. 语音输入
javascript
// 语音输入到文本框
const input = document.getElementById('voice-input')
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()
recognition.continuous = false
recognition.interimResults = false
recognition.lang = 'zh-CN'
recognition.onresult = (event) => {
const transcript = event.results[0][0].transcript
input.value = transcript
}
document.getElementById('mic-btn').onclick = () => {
recognition.start()
}2. 语音朗读
javascript
// 朗读文章内容
function readArticle() {
const article = document.querySelector('article').textContent
const utterance = new SpeechSynthesisUtterance(article)
utterance.lang = 'zh-CN'
utterance.rate = 1
speechSynthesis.speak(utterance)
}3. 语音控制
javascript
// 语音命令控制
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()
recognition.continuous = true
recognition.lang = 'zh-CN'
recognition.onresult = (event) => {
const command = event.results[event.results.length - 1][0].transcript.trim()
switch (command) {
case '打开菜单':
openMenu()
break
case '关闭菜单':
closeMenu()
break
case '滚动到顶部':
window.scrollTo(0, 0)
break
case '滚动到底部':
window.scrollTo(0, document.body.scrollHeight)
break
}
}4. 无障碍辅助
javascript
// 为视障用户朗读页面元素
function speakElement(element) {
const text = element.textContent || element.alt || element.getAttribute('aria-label')
if (text) {
const utterance = new SpeechSynthesisUtterance(text)
utterance.lang = document.documentElement.lang || 'zh-CN'
speechSynthesis.speak(utterance)
}
}
// 监听焦点,朗读元素内容
document.addEventListener('focusin', (e) => {
speakElement(e.target)
})注意事项
权限问题
- 语音识别需要用户授予麦克风权限
- 必须在 HTTPS 环境下使用(localhost 除外)
- 某些浏览器需要用户交互才能开始识别
性能优化
javascript
// 避免重复创建实例
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()
// 及时停止识别
recognition.onresult = (event) => {
// 处理结果
if (someCondition) {
recognition.stop()
}
}
// 语音合成队列管理
function speakWithQueue(texts) {
texts.forEach((text, index) => {
const utterance = new SpeechSynthesisUtterance(text)
utterance.onend = () => {
console.log(`第 ${index + 1} 段播放完成`)
}
speechSynthesis.speak(utterance)
})
}最佳实践
- 提供视觉反馈: 使用图标或动画显示识别状态
- 错误处理: 妥善处理各种错误情况
- 用户控制: 提供开始/停止/暂停按钮
- 隐私提示: 告知用户语音数据如何处理
- 降级方案: 对不支持的浏览器提供替代方案
javascript
// 完整的降级处理
class SpeechHelper {
constructor() {
this.hasRecognition = 'SpeechRecognition' in window || 'webkitSpeechRecognition' in window
this.hasSynthesis = 'speechSynthesis' in window
}
startRecognition(callback) {
if (!this.hasRecognition) {
alert('您的浏览器不支持语音识别,请使用文字输入')
return
}
// 正常处理
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()
recognition.onresult = callback
recognition.start()
}
speak(text) {
if (!this.hasSynthesis) {
console.warn('浏览器不支持语音合成')
return
}
const utterance = new SpeechSynthesisUtterance(text)
speechSynthesis.speak(utterance)
}
}