第一句子网 > 【语音识别】基于matlab GUI声纹识别【含Matlab源码 1022期】

【语音识别】基于matlab GUI声纹识别【含Matlab源码 1022期】

时间：2021-05-31 04:17:55

一、声纹识别简介

本文基于Matlab设计实现了一个文本相关的声纹识别系统，可以判定说话人身份。

1 系统原理

a 声纹识别

这两年随着人工智能的发展，不少手机App都推出了声纹锁的功能。这里面所采用的主要就是声纹识别相关的技术。声纹识别又叫说话人识别，它和语音识别存在一点差别。

b 梅尔频率倒谱系数（MFCC）

梅尔频率倒谱系数（Mel Frequency Cepstrum Coefficient, MFCC）是语音信号处理中最常用的语音信号特征之一。

实验观测发现人耳就像一个滤波器组一样，它只关注频谱上某些特定的频率。人耳的声音频率感知范围在频谱上的不遵循线性关系，而是在Mel频域上遵循近似线性关系。

梅尔频率倒谱系数考虑到了人类的听觉特征，先将线性频谱映射到基于听觉感知的Mel非线性频谱中，然后转换到倒谱上。普通频率转换到梅尔频率的关系式为：

c 矢量量化（VectorQuantization）

本系统利用矢量量化对提取的语音MFCC特征进行压缩。

VectorQuantization (VQ)是一种基于块编码规则的有损数据压缩方法。事实上，在 JPEG 和 MPEG-4 等多媒体压缩格式里都有 VQ 这一步。它的基本思想是：将若干个标量数据组构成一个矢量，然后在矢量空间给以整体量化，从而压缩了数据而不损失多少信息。

3 系统结构

本文整个系统的结构如下图：

3.1 训练过程

首先对语音信号进行预处理，之后提取MFCC特征参数，利用矢量量化方法进行压缩，得到说话人发音的码本。同一说话人多次说同一内容，重复该训练过程，最终形成一个码本库。

3.2 识别过程

在识别时，同样先对语音信号预处理，提取MFCC特征，比较本次特征和训练库码本之间的欧氏距离。当小于某个阈值，我们认定本次说话的说话人及说话内容与训练码本库中的一致，配对成功。

二、部分源代码

function varargout = GUI(varargin)% GUI MATLAB code for GUI.fig%GUI, by itself, creates a new GUI or raises the existing%singleton*.%%H = GUI returns the handle to a new GUI or the handle to%the existing singleton*.%%GUI('CALLBACK',hObject,eventData,handles,...) calls the local%function named CALLBACK in GUI.M with the given input arguments.%%GUI('Property','Value',...) creates a new GUI or raises the%existing singleton*. Starting from the left, property value pairs are%applied to the GUI before GUI_OpeningFcn gets called. An%unrecognized property name or invalid value makes property application%stop. All inputs are passed to GUI_OpeningFcn via varargin.%%*See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one%instance to run (singleton)".%% See also: GUIDE, GUIDATA, GUIHANDLES% Edit the above text to modify the response to help GUI% Last Modified by GUIDE v2.5 15-Mar- 17:37:45% Begin initialization code - DO NOT EDITgui_Singleton = 1;gui_State = struct('gui_Name', mfilename, ...'gui_Singleton', gui_Singleton, ...'gui_OpeningFcn', @GUI_OpeningFcn, ...'gui_OutputFcn', @GUI_OutputFcn, ...'gui_LayoutFcn', [] , ...'gui_Callback', []);if nargin && ischar(varargin{1})gui_State.gui_Callback = str2func(varargin{1});endif nargout[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});elsegui_mainfcn(gui_State, varargin{:});end% End initialization code - DO NOT EDIT% --- Executes just before GUI is made visible.function GUI_OpeningFcn(hObject, eventdata, handles, varargin)% This function has no output args, see OutputFcn.% hObject handle to figure% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% varargin command line arguments to GUI (see VARARGIN)% Choose default command line output for GUIhandles.output = hObject;% Update handles structureguidata(hObject, handles);% UIWAIT makes GUI wait for user response (see UIRESUME)% uiwait(handles.figure1);% --- Outputs from this function are returned to the command line.function varargout = GUI_OutputFcn(hObject, eventdata, handles) % varargout cell array for returning output args (see VARARGOUT);% hObject handle to figure% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% Get default command line output from handles structurevarargout{1} = handles.output;% --- Executes on button press in pushbutton1.function pushbutton1_Callback(hObject, eventdata, handles)fprintf('\n识别中...\n\n');%加载训练好的GMM模型load speakerData;load speakerGmm;waveDir='trainning\'; %导入测试集Test_speakerData = dir(waveDir); %获取测试集中的结构体数据，这是一个char类型的结构体Test_speakerData(1:2) = [];Test_speakerNum=length(Test_speakerData);Test_speakerNumcount=0;%%%%%%%%%%%%%%%%for i=1:Test_speakerNum%%%读取语音[filename,filepath]=uigetfile('*.wav','选择音频文件');set(handles.text1,'string',filepath)filep=strcat(filepath,filename); [testing_data, fs]=audioread(filep);sound(testing_data, fs);save testing_dataload testing_datay=testing_dataaxes(handles.axes1)plot(y);xlabel('t');ylabel('幅值');title('时域图');%频域%幅频图N=length(y); fs1=100; %采样频率n=0:N-1;t=n/fs; %时间序列yfft =fft(y,N);mag=abs(yfft);%取振幅的绝对值f=n*fs/N; %频率序列axes(handles.axes2)plot(f(1:N/2),mag(1:N/2)); %绘出Nyquist频率之前随频率变化的振幅xlabel('频率/Hz');ylabel('振幅');title('频域图');%相谱A=abs(yfft);ph=2*angle(yfft(1:N/2));ph=ph*180/pi;axes(handles.axes3);plot(f(1:N/2),ph(1:N/2));xlabel('频率/hz'),ylabel('相角'),title('数字0-9的相位谱');% 绘制功率谱Fs=1000;n=0:1/Fs:1;xn=y;nfft=1024;window=boxcar(length(n)); %矩形窗noverlap=0; %数据无重叠p=0.9; %置信概率[Pxx,Pxxc]=psd(xn,nfft,Fs,window,noverlap,p);index=0:round(nfft/2-1);k=index*Fs/nfft;plot_Pxx=10*log10(Pxx(index+1));plot_Pxxc=10*log10(Pxxc(index+1));axes(handles.axes4)plot(k,plot_Pxx);title('数字0-9的功率谱');axes(handles.axes5)surf( speakerData(1).mfcc); %绘制MFCC的三维图title('第一个人语音的三维MFCC'); %第一个人说话的mfcc的特征 mfcc是指梅尔倒谱系数%绘制第一个人的MFCC的全部二维图axes(handles.axes6)for i=1:speakerNumfprintf('\n为第%d个语者%s训练GMM……', i,speakerData(i).name(1:end-4));[speakerGmm(i).mu, speakerGmm(i).sigm,speakerGmm(i).c] = gmm_estimate(speakerData(i).mfcc(:,5:12)',gaussianNum,20); %转置正确endfprintf('\n');save speakerGmm speakerGmm; %保存样本GMM% hObject handle to pushbutton5 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)% --- Executes on button press in pushbutton6.function pushbutton6_Callback(hObject, eventdata, handles)clcclose all% hObject handle to pushbutton6 (see GCBO)% eventdata reserved - to be defined in a future version of MATLAB% handles structure with handles and user data (see GUIDATA)