第一句子网 > 语音识别系列︱用python进行音频解析（一）

语音识别系列︱用python进行音频解析（一）

时间：2020-11-23 19:16:46

笔者最近在挑选开源的语音识别模型，首要测试的是百度的paddlepaddle；

测试之前，肯定需要了解一下音频解析的一些基本技术点，于是有此篇先导文章。

笔者看到的音频解析主要有几个：

soundfileffmpylibrosa

文章目录

1 librosa1.1 音频读入1.2 音频写出1.3 librosa 读入 + PySoundFile写出1.4 从其他库转为librosa格式2 PySoundFile2.1 读入音频2.2 导出音频3 ffmpy4 AudioSegment / pydub5 paddleaudio6 音频切分 - auditok7 一个比较难解决的报错8 从网址URL下载音频8.1 soundfile

1 librosa

安装代码：

!pip install librosa -i /pypi/simple!pip install soundfile -i /pypi/simple

参考文档：librosa

1.1 音频读入

文档位置：/doc/latest/core.html#audio-loading

signal, sr = librosa.load(path, sr=None)

其中load的参数包括：

librosa.load(path, *, sr=22050, mono=True, offset=0.0, duration=None, dtype=<class 'numpy.float32'>, res_type='kaiser_best')

其中sr = None，‘None’ 保留原始采样频率，设置其他采样频率会进行重采样，有点耗时

可以读 .wav 和 .mp3；

1.2 音频写出

在网络上其他几篇：python音频采样率转换和 python 音频文件采样率转换在导出音频文件时候，会出现错误，贴一下他们的代码

代码片段一：

def resample_rate(path,new_sample_rate = 16000):signal, sr = librosa.load(path, sr=None)wavfile = path.split('/')[-1]wavfile = wavfile.split('.')[0]file_name = wavfile + '_new.wav'new_signal = librosa.resample(signal, sr, new_sample_rate) # librosa.output.write_wav(file_name, new_signal , new_sample_rate)

代码片段二：

import librosaimport osnoise_name="/media/dfy/fc0b6513-c379-4548-b391-876575f1493f/home/dfy/PycharmProjects/noise_data/"noise_name_list=os.listdir(noise_name)for one_name in noise_name_list:data=librosa.load(noise_name+one_name,16000)librosa.output.write_wav(noise_name+one_name,data[0],16000,norm=False)if __name__ == '__main__':pass

上述都是使用librosa.output进行导出，最新的librosa已经摒弃了这个函数。出现报错：

AttributeError: module librosa has no attribute output No module named numba.decorators错误解决

0.8.0版本的将output的api屏蔽掉了，所以要么就是librosa降低版本，比如到0.7.2，要么使用另外的方式。

于是来到官方文档：librosa

推荐使用write的方式，是使用这个库：PySoundFile

1.3 librosa 读入 + PySoundFile写出

如果出现报错：

Input audio file has sample rate [44100], but decoder expects [16000]

就是音频采样比不对，需要修改一下。

笔者将1+2的开源库结合，微调了python音频采样率转换和 python 音频文件采样率转换，得到以下，切换音频采样频率的函数：

import librosaimport osimport numpy as npimport soundfile as sfdef resample_rate(path,new_sample_rate = 16000):signal, sr = librosa.load(path, sr=None)wavfile = path.split('/')[-1]wavfile = wavfile.split('.')[0]file_name = wavfile + '_new.wav'new_signal = librosa.resample(signal, sr, new_sample_rate) # #librosa.output.write_wav(file_name, new_signal , new_sample_rate) sf.write(file_name, new_signal, new_sample_rate, subtype='PCM_24')print(f'{file_name} has download.')# wav_file = 'video/xxx.wav'resample_rate(wav_file,new_sample_rate = 16000)

改变为sample_rate 为16000的音频文件

1.4 从其他库转为librosa格式

参考：/doc/latest/generated/librosa.load.html#librosa.load

第一种：

# Load using an already open SoundFile objectimport soundfilesfo = soundfile.SoundFile(librosa.ex('brahms'))y, sr = librosa.load(sfo)

第二种：

# Load using an already open audioread objectimport audioread.ffdec # Use ffmpeg decoderaro = audioread.ffdec.FFmpegAudioFile(librosa.ex('brahms'))y, sr = librosa.load(aro)

2 PySoundFile

python-soundfile是一个基于libsndfile、CFFI和NumPy的音频库。

可以直接使用函数read()和write()来读写声音文件。要按块方式读取声音文件，请使用blocks()。另外，声音文件也可以作为SoundFile对象打开。

PySoundFile的官方文档：readthedocs

下载：

!pip install soundfile -i /pypi/simple

2.1 读入音频

read files from zip compressed archives:

import zipfile as zfimport soundfile as sfimport iowith zf.ZipFile('test.zip') as myzip:with myzip.open('stereo_file.wav') as myfile:tmp = io.BytesIO(myfile.read())data, samplerate = sf.read(tmp)

Download and read from URL:

import soundfile as sfimport iofrom six.moves.urllib.request import urlopenurl = "/librosa/librosa/master/tests/data/test1_44100.wav"data, samplerate = sf.read(io.BytesIO(urlopen(url).read()))

2.2 导出音频

导出音频的：

import numpy as npimport soundfile as sfrate = 44100data = np.random.uniform(-1, 1, size=(rate * 10, 2))# Write out audio as 24bit PCM WAVsf.write('stereo_file.wav', data, samplerate, subtype='PCM_24')# Write out audio as 24bit Flacsf.write('stereo_file.flac', data, samplerate, format='flac', subtype='PCM_24')# Write out audio as 16bit OGGsf.write('stereo_file.ogg', data, samplerate, format='ogg', subtype='vorbis')

3 ffmpy

Python 批量转换视频音频采样率（附代码） | Python工具

下载：

pip install ffmpy -i /simple

具体代码见原文，只截取其中一段：

def transfor(video_path: str, tmp_dir: str, result_dir: str):file_name = os.path.basename(video_path)base_name = file_name.split('.')[0]file_ext = file_name.split('.')[-1]ext = 'wav'audio_path = os.path.join(tmp_dir, '{}.{}'.format(base_name, ext))print('文件名:{}，提取音频'.format(audio_path))ff = FFmpeg(inputs={video_path: None}, outputs={audio_path: '-f {} -vn -ac 1 -ar 16000 -y'.format('wav')})print(ff.cmd)ff.run()if os.path.exists(audio_path) is False:return Nonevideo_tmp_path = os.path.join(tmp_dir, '{}_1.{}'.format(base_name, file_ext))ff_video = FFmpeg(inputs={video_path: None},outputs={video_tmp_path: '-an'})print(ff_video.cmd)ff_video.run()result_video_path = os.path.join(result_dir, file_name)ff_fuse = FFmpeg(inputs={video_tmp_path: None, audio_path: None}, outputs={result_video_path: '-map 0:v -map 1:a -c:v copy -c:a aac -shortest'})print(ff_fuse.cmd)ff_fuse.run()return result_video_path

4 AudioSegment / pydub

参考文章：

Python | 语音处理 | 用 librosa / AudioSegment / soundfile 读取音频文件的对比

另外一篇对pydub的参数介绍：

pydub简单介绍

官网地址：pydub

from pydub import AudioSegment #需要导入pydub三方库，第一次使用需要安装audio_path = './data/example.mp3't = time.time()song = AudioSegment.from_file(audio_path, format='mp3')# print(len(song)) #时长，单位：毫秒# print(song.frame_rate) #采样频率，单位：赫兹# print(song.sample_width) #量化位数，单位：字节# print(song.channels) #声道数，常见的MP3多是双声道的，声道越多文件也会越大。wav = np.array(song.get_array_of_samples())sr = song.frame_rateprint(f"sr={sr}, len={len(wav)}, 耗时: {time.time()-t}")print(f"(min, max, mean) = ({wav.min()}, {wav.max()}, {wav.mean()})")wav

输出结果为：

sr=16000, len=64320, 耗时: 0.04667925834655762(min, max, mean) = (-872, 740, -0.6079446517412935)array([ 1, -1, -2, ..., -1, 1, -2], dtype=int16)

5 paddleaudio

安装：

! pip install paddleaudio -i /pypi/simple

paddle官方封装的一个，音频基本操作应该是librosa的库

具体参考：

https://paddleaudio-doc.readthedocs.io/en/latest/index.html

import paddleaudioaudio_file = 'XXX.wav'paddleaudio.load(audio_file, sr=None, mono=True, normal=False)

得出：

(array([-3.9100647e-04, -3.0159950e-05, 1.1110306e-04, ...,1.4603138e-04, 2.5625229e-03, -7.6780319e-03], dtype=float32),16000)

音频数值 + 采样率

6 音频切分 - auditok

参考的是:【超简单】之基于PaddleSpeech搭建个人语音听写服务

!pip install auditok

切分原因上面交代过，因为PaddleSpeech识别最长语音为50s，故需要切分，这里直接调用好了。

from paddlespeech.cli.asr.infer import ASRExecutorimport csvimport moviepy.editor as mpimport auditokimport osimport paddlefrom paddlespeech.cli import ASRExecutor, TextExecutorimport soundfileimport librosaimport warningswarnings.filterwarnings('ignore')# 引入auditok库import auditok# 输入类别为audiodef qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55):audio_file = pathaudio, audio_sample_rate = soundfile.read(audio_file, dtype="int16", always_2d=True)audio_regions = auditok.split(audio_file,min_dur=mmin_dur, # minimum duration of a valid audio event in secondsmax_dur=mmax_dur, # maximum duration of an event# maximum duration of tolerated continuous silence within an eventmax_silence=mmax_silence,energy_threshold=menergy_threshold # threshold of detection)for i, r in enumerate(audio_regions):# Regions returned by `split` have 'start' and 'end' metadata fieldsprint("Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))epath = ''file_pre = str(epath.join(audio_file.split('.')[0].split('/')[-1]))mk = 'change'if (os.path.exists(mk) == False):os.mkdir(mk)if (os.path.exists(mk + '/' + ty) == False):os.mkdir(mk + '/' + ty)if (os.path.exists(mk + '/' + ty + '/' + file_pre) == False):os.mkdir(mk + '/' + ty + '/' + file_pre)num = i# 为了取前三位数字排序s = '000000' + str(num)file_save = mk + '/' + ty + '/' + file_pre + '/' + \s[-3:] + '-' + '{meta.start:.3f}-{meta.end:.3f}' + '.wav'filename = r.save(file_save)print("region saved as: {}".format(filename))return mk + '/' + ty + '/' + file_pre

其中核心的auditok.split代码，参数详解在auditok.core.split ，其输入的是音频文件名，不能是音频的data格式。

7 一个比较难解决的报错

AudioParameterError: Sample width must be one of: 1, 2 or 4 (bytes)

笔者在跑语音模型的识别遇到了以上的报错，

但是网上找了一圈，没找到对的解决方案。

在快要放弃的时候，无意间看到AudioSegment库的神奇功能。

Sample width是什么？

取样量化位宽（sampwidth）

import wavefile ='asr_example.wav'with wave.open(file) as fp:channels = fp.getnchannels()srate = fp.getframerate()swidth = fp.getsampwidth()data = fp.readframes(-1)swidth,srate

通过wave可以查询到一个音频的比较重要的几个参数。

分别为：

nchannels:声道数sampwidth:返回该实例每一帧的字节宽度。framerate:采样频率nframes:采样点数

那遇到上述报错就需要重新调整，这里AudioSegment库直接有

from pydub import AudioSegmentfile_in ='asr_example.wav' # 输入的音频名称file_out = 'asr_example_3.wav' # 输出的音频名称sound = AudioSegment.from_file(file_in)sound = sound.set_frame_rate(48000) # 可以修改音频采样率sound = sound.set_sample_width(4) # 重新设置字节宽度sound.export(file_out, format="wav")

以上就可以完美解决。

8 从网址URL下载音频

几种读入方式：

8.1 soundfile

import soundfile as sfdef save_audio_func(video_url,save_samplerate = 16000):'''音频导出'''save_name = video_url.split('/')[-1]data, samplerate = sf.read(io.BytesIO(urlopen(video_url).read()))# Write out audio as 24bit PCM WAVsf.write(save_name, data, save_samplerate, subtype='PCM_24')#print('')return save_name

读入、读出都是通过soundfile

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。