教你怎样用python停止语音辨认

白雨思
2019-11-19 17:05:05 3

使用Python实现语音识别

在网上可以找到一些成熟的语音转文字模型，例如Google Cloud Speech API，但需要你在谷歌云平台上操作。对于Python开发者来说，找到合适的工具包其实并不困难。GitHub上有一个名为speech recognition的包，它支持实时翻译，只需要在机器上安装相关的麦克风依赖包即可。此外，该包还能从语音文件中提取文字内容，并调用多种平台上的模型，如Google API、CMU Sphinx、Microsoft Bing Speech、IBM Speech to Text和Wit.ai等。

离线语音识别

在国内的网络环境下，无法直接使用Google API进行语音转文字的操作，因为这需要连接到Google的服务。不过，你可以租用一个海外的VPS来完成这一任务。本文将介绍如何在不联网的情况下，利用Python将语音文件转换为文字。这里使用的工具包是sphinx，由美国卡内基梅隆大学开发，支持大词汇量、非特定人的连续英语语音识别。

安装Sphinx

以Ubuntu为例，假设你的环境是Ubuntu 16.04.3 LTS。在安装sphinx之前，需要先安装一些必要的软件包： bash sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev 然后可以从相关网站下载sphinxbase安装包，或者直接从GitHub克隆代码： bash git clone https://github.com/cmusphinx/sphinxbase.git 下载完成后解压并进入相应目录： bash cd sphinxbase ./autogen.sh make sudo make install 安装完成后，还需要设置库路径，以便系统能够正确加载共享库： bash echo "/usr/local/lib" | sudo tee -a /etc/ld.so.conf sudo ldconfig 现在可以通过sphinx_lm_convert命令将模型DMP文件转换为二进制格式： bash sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o zh_CN.lm.bin

在Python中使用Sphinx

要在Python中使用sphinx，需要安装一些依赖包： bash pip install pydub pip install SpeechRecognition sudo apt-get install build-essential swig libpulse-dev pip install -U pocketsphinx sudo apt-get install libav-tools 安装完成后，可以通过以下代码进行语音识别： ```python import os from pydub import AudioSegment import speech_recognition as sr

将MP3文件转换为wav文件

mp3file = "/path/to/file.mp3" wavfile = mp3file.replace(".mp3", ".wav") sound = AudioSegment.frommp3(mp3file) sound.export(wavfile, format="wav")

语音识别

r = sr.Recognizer() audiofile = sr.AudioFile(wavfile) with audio_file as source: audio = r.record(source)

text = r.recognizesphinx(audio, language='zhCN') print(text) ``` 测试结果显示，sphinx模型在处理简短句子时表现尚可，但在处理较长句子时准确性较低。这可能是因为训练数据不足所致。为了提高准确性，建议自己训练模型。

Google API语音识别

使用Google API进行语音识别也非常准确，只需连接到Google的服务即可。以下是一个示例代码片段： ```python import speech_recognition as sr

r = sr.Recognizer() audio_file = sr.AudioFile("/path/to/audio.wav")

with audio_file as source: audio = r.record(source, duration=15)

text = r.recognizegoogle(audio, language='zh-CN') print(text) ``尽管Google API的准确性较高，但处理大型音频文件时可能会遇到性能问题，导致超时错误。幸运的是，speechrecognition`库支持截取音频片段进行处理，从而提高效率。