Skip to content

Speech API

Web Speech API 允许你在 Web 应用中集成语音数据。它包含两个主要部分:

  • Speech Recognition (语音识别): 将语音转换为文本
  • Speech Synthesis (语音合成): 将文本转换为语音

Speech Recognition (语音识别)

基本概念

语音识别通过 SpeechRecognition 接口实现,允许将语音输入识别为文本。

核心接口

  • SpeechRecognition: 语音识别的控制器接口
  • SpeechRecognitionEvent: 识别结果的事件对象
  • SpeechRecognitionErrorEvent: 错误事件对象
  • SpeechRecognitionAlternative: 单个识别结果
  • SpeechRecognitionResult: 单次识别的所有可能结果

基本用法

javascript
// 创建识别器实例 (注意浏览器前缀)
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = new SpeechRecognition()

// 配置选项
recognition.continuous = true // 是否持续识别
recognition.interimResults = true // 是否返回临时结果
recognition.lang = 'zh-CN' // 识别语言
recognition.maxAlternatives = 1 // 每次识别返回的最大结果数

// 开始识别
recognition.start()

// 监听结果
recognition.onresult = (event) => {
  const results = event.results
  const lastResult = results[results.length - 1]

  if (lastResult.isFinal) {
    const transcript = lastResult[0].transcript
    console.log('最终结果:', transcript)
  } else {
    const transcript = lastResult[0].transcript
    console.log('临时结果:', transcript)
  }
}

// 监听错误
recognition.onerror = (event) => {
  console.error('识别错误:', event.error)
}

// 监听结束
recognition.onend = () => {
  console.log('识别已结束')
}

// 停止识别
// recognition.stop();

// 中止识别
// recognition.abort();

重要属性

属性类型说明
langString识别语言 (如 'zh-CN', 'en-US')
continuousBoolean是否持续识别,默认 false
interimResultsBoolean是否返回临时结果,默认 false
maxAlternativesNumber每次识别返回的最大备选结果数
grammarsSpeechGrammarList语法列表,用于提高特定词汇识别率

常用事件

事件说明
start识别服务开始
end识别服务结束
result返回识别结果
error发生错误
audiostart用户代理开始捕获音频
audioend用户代理停止捕获音频
soundstart检测到声音
soundend声音停止
speechstart检测到语音
speechend语音停止
nomatch识别服务返回无匹配结果

错误类型

javascript
recognition.onerror = (event) => {
  switch (event.error) {
    case 'no-speech':
      console.log('未检测到语音')
      break
    case 'aborted':
      console.log('识别被中止')
      break
    case 'audio-capture':
      console.log('音频捕获失败')
      break
    case 'network':
      console.log('网络错误')
      break
    case 'not-allowed':
      console.log('未授予麦克风权限')
      break
    case 'service-not-allowed':
      console.log('服务不可用')
      break
  }
}

完整示例

javascript
class VoiceRecognition {
  constructor() {
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition

    if (!SpeechRecognition) {
      throw new Error('浏览器不支持语音识别')
    }

    this.recognition = new SpeechRecognition()
    this.isRecognizing = false
    this.setupRecognition()
  }

  setupRecognition() {
    this.recognition.continuous = true
    this.recognition.interimResults = true
    this.recognition.lang = 'zh-CN'

    this.recognition.onstart = () => {
      this.isRecognizing = true
      console.log('开始识别')
    }

    this.recognition.onresult = (event) => {
      let interimTranscript = ''
      let finalTranscript = ''

      for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript
        if (event.results[i].isFinal) {
          finalTranscript += transcript
        } else {
          interimTranscript += transcript
        }
      }

      this.onResult(finalTranscript, interimTranscript)
    }

    this.recognition.onerror = (event) => {
      this.onError(event.error)
    }

    this.recognition.onend = () => {
      this.isRecognizing = false
      console.log('识别结束')
    }
  }

  start() {
    if (!this.isRecognizing) {
      this.recognition.start()
    }
  }

  stop() {
    if (this.isRecognizing) {
      this.recognition.stop()
    }
  }

  onResult(final, interim) {
    // 由子类实现或外部传入
    console.log('最终结果:', final)
    console.log('临时结果:', interim)
  }

  onError(error) {
    // 由子类实现或外部传入
    console.error('错误:', error)
  }
}

// 使用
const voiceRec = new VoiceRecognition()
voiceRec.onResult = (final, interim) => {
  if (final) {
    document.getElementById('final').textContent = final
  }
  document.getElementById('interim').textContent = interim
}
voiceRec.start()

Speech Synthesis (语音合成)

基本概念

语音合成通过 SpeechSynthesis 接口实现,允许将文本转换为语音输出。

核心接口

  • SpeechSynthesis: 语音合成的控制器接口
  • SpeechSynthesisUtterance: 语音请求对象
  • SpeechSynthesisVoice: 语音(音色)对象
  • SpeechSynthesisEvent: 语音事件对象

基本用法

javascript
// 创建语音实例
const utterance = new SpeechSynthesisUtterance('你好,世界')

// 配置选项
utterance.lang = 'zh-CN' // 语言
utterance.pitch = 1 // 音调 (0-2)
utterance.rate = 1 // 语速 (0.1-10)
utterance.volume = 1 // 音量 (0-1)

// 选择音色
const voices = speechSynthesis.getVoices()
utterance.voice = voices.find((v) => v.lang === 'zh-CN')

// 监听事件
utterance.onstart = () => console.log('开始播放')
utterance.onend = () => console.log('播放结束')
utterance.onerror = (e) => console.error('播放错误', e)

// 播放
speechSynthesis.speak(utterance)

// 控制方法
// speechSynthesis.pause();   // 暂停
// speechSynthesis.resume();  // 继续
// speechSynthesis.cancel();  // 取消

SpeechSynthesisUtterance 属性

属性类型默认值说明
textString''要合成的文本内容
langStringdocument.documentElement.lang语言
voiceSpeechSynthesisVoicenull使用的音色
volumeNumber1音量 (0-1)
rateNumber1语速 (0.1-10)
pitchNumber1音调 (0-2)

SpeechSynthesis 方法

方法说明
speak(utterance)将语音请求添加到队列
cancel()移除队列中所有语音请求
pause()暂停播放
resume()继续播放
getVoices()获取可用的音色列表

SpeechSynthesis 属性

属性类型说明
pausedBoolean是否暂停
pendingBoolean队列中是否有待播放内容
speakingBoolean是否正在播放

事件

事件说明
start开始播放
end播放结束
pause暂停播放
resume继续播放
error发生错误
boundary到达词或句子边界
mark到达 SSML 标记

获取可用音色

javascript
// 注意:音色列表可能异步加载
let voices = []

function loadVoices() {
  voices = speechSynthesis.getVoices()

  // 按语言分组
  const voicesByLang = voices.reduce((acc, voice) => {
    const lang = voice.lang.split('-')[0]
    if (!acc[lang]) acc[lang] = []
    acc[lang].push(voice)
    return acc
  }, {})

  console.log('可用音色:', voicesByLang)
}

// 首次加载
loadVoices()

// 监听音色变化(某些浏览器需要)
speechSynthesis.onvoiceschanged = loadVoices

完整示例

javascript
class TextToSpeech {
  constructor() {
    this.synth = window.speechSynthesis
    this.voices = []
    this.loadVoices()
  }

  loadVoices() {
    this.voices = this.synth.getVoices()

    if (this.voices.length === 0) {
      this.synth.onvoiceschanged = () => {
        this.voices = this.synth.getVoices()
      }
    }
  }

  speak(text, options = {}) {
    // 取消当前播放
    this.synth.cancel()

    const utterance = new SpeechSynthesisUtterance(text)

    // 配置选项
    utterance.lang = options.lang || 'zh-CN'
    utterance.pitch = options.pitch || 1
    utterance.rate = options.rate || 1
    utterance.volume = options.volume || 1

    // 选择音色
    if (options.voiceName) {
      utterance.voice = this.voices.find((v) => v.name === options.voiceName)
    } else {
      utterance.voice = this.voices.find((v) => v.lang === utterance.lang)
    }

    // 事件处理
    utterance.onstart = () => {
      console.log('开始播放')
      options.onStart?.()
    }

    utterance.onend = () => {
      console.log('播放结束')
      options.onEnd?.()
    }

    utterance.onerror = (event) => {
      console.error('播放错误:', event)
      options.onError?.(event)
    }

    utterance.onpause = () => {
      console.log('播放暂停')
    }

    utterance.onresume = () => {
      console.log('继续播放')
    }

    // 播放
    this.synth.speak(utterance)
  }

  pause() {
    this.synth.pause()
  }

  resume() {
    this.synth.resume()
  }

  cancel() {
    this.synth.cancel()
  }

  getVoices(lang) {
    if (lang) {
      return this.voices.filter((v) => v.lang.startsWith(lang))
    }
    return this.voices
  }
}

// 使用
const tts = new TextToSpeech()

tts.speak('你好,世界', {
  lang: 'zh-CN',
  rate: 1.2,
  pitch: 1,
  onStart: () => console.log('开始'),
  onEnd: () => console.log('结束'),
})

浏览器兼容性

Speech Recognition

  • ✅ Chrome/Edge (需要 webkit 前缀)
  • ✅ Safari (需要 webkit 前缀)
  • ❌ Firefox (部分支持,需开启实验特性)
  • ❌ Opera

Speech Synthesis

  • ✅ Chrome/Edge
  • ✅ Safari
  • ✅ Firefox
  • ✅ Opera

兼容性检测

javascript
// 检测语音识别支持
const supportsSpeechRecognition = 'SpeechRecognition' in window || 'webkitSpeechRecognition' in window

// 检测语音合成支持
const supportsSpeechSynthesis = 'speechSynthesis' in window

if (!supportsSpeechRecognition) {
  console.warn('浏览器不支持语音识别')
}

if (!supportsSpeechSynthesis) {
  console.warn('浏览器不支持语音合成')
}

实际应用场景

1. 语音输入

javascript
// 语音输入到文本框
const input = document.getElementById('voice-input')
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()

recognition.continuous = false
recognition.interimResults = false
recognition.lang = 'zh-CN'

recognition.onresult = (event) => {
  const transcript = event.results[0][0].transcript
  input.value = transcript
}

document.getElementById('mic-btn').onclick = () => {
  recognition.start()
}

2. 语音朗读

javascript
// 朗读文章内容
function readArticle() {
  const article = document.querySelector('article').textContent
  const utterance = new SpeechSynthesisUtterance(article)

  utterance.lang = 'zh-CN'
  utterance.rate = 1

  speechSynthesis.speak(utterance)
}

3. 语音控制

javascript
// 语音命令控制
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()
recognition.continuous = true
recognition.lang = 'zh-CN'

recognition.onresult = (event) => {
  const command = event.results[event.results.length - 1][0].transcript.trim()

  switch (command) {
    case '打开菜单':
      openMenu()
      break
    case '关闭菜单':
      closeMenu()
      break
    case '滚动到顶部':
      window.scrollTo(0, 0)
      break
    case '滚动到底部':
      window.scrollTo(0, document.body.scrollHeight)
      break
  }
}

4. 无障碍辅助

javascript
// 为视障用户朗读页面元素
function speakElement(element) {
  const text = element.textContent || element.alt || element.getAttribute('aria-label')
  if (text) {
    const utterance = new SpeechSynthesisUtterance(text)
    utterance.lang = document.documentElement.lang || 'zh-CN'
    speechSynthesis.speak(utterance)
  }
}

// 监听焦点,朗读元素内容
document.addEventListener('focusin', (e) => {
  speakElement(e.target)
})

注意事项

权限问题

  • 语音识别需要用户授予麦克风权限
  • 必须在 HTTPS 环境下使用(localhost 除外)
  • 某些浏览器需要用户交互才能开始识别

性能优化

javascript
// 避免重复创建实例
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()

// 及时停止识别
recognition.onresult = (event) => {
  // 处理结果
  if (someCondition) {
    recognition.stop()
  }
}

// 语音合成队列管理
function speakWithQueue(texts) {
  texts.forEach((text, index) => {
    const utterance = new SpeechSynthesisUtterance(text)
    utterance.onend = () => {
      console.log(`第 ${index + 1} 段播放完成`)
    }
    speechSynthesis.speak(utterance)
  })
}

最佳实践

  1. 提供视觉反馈: 使用图标或动画显示识别状态
  2. 错误处理: 妥善处理各种错误情况
  3. 用户控制: 提供开始/停止/暂停按钮
  4. 隐私提示: 告知用户语音数据如何处理
  5. 降级方案: 对不支持的浏览器提供替代方案
javascript
// 完整的降级处理
class SpeechHelper {
  constructor() {
    this.hasRecognition = 'SpeechRecognition' in window || 'webkitSpeechRecognition' in window
    this.hasSynthesis = 'speechSynthesis' in window
  }

  startRecognition(callback) {
    if (!this.hasRecognition) {
      alert('您的浏览器不支持语音识别,请使用文字输入')
      return
    }

    // 正常处理
    const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)()
    recognition.onresult = callback
    recognition.start()
  }

  speak(text) {
    if (!this.hasSynthesis) {
      console.warn('浏览器不支持语音合成')
      return
    }

    const utterance = new SpeechSynthesisUtterance(text)
    speechSynthesis.speak(utterance)
  }
}

参考资源

基于 MIT 许可发布