摘要
针对大数据量音频的高速处理,提出一种快速的声学特征超向量生成方法,有效提高音频识别系统的识别速度和精度.所提方法首先将多个连续音频帧的常用声学特征构成声学特征图,进而使用低复杂度的运算方法在其中快速提取维数达数十万的Haar-like声学特征;然后使用AdaBoost.MH算法,筛选出具有较高代表性的Haarlike声学特征模式组合,用以构成声学特征超向量;进而提出Random AdaBoost特征筛选方法,进一步提高特征筛选速度.实验结果表明,在音频事件识别、说话人识别、说话人性别识别3种场合下,使用Haar-like声学特征可以使SVM、C5.0、AdaBoost等识别算法获得比MFCC、PLP、LPCC等常用声学特征更高的识别准确率,同时可以获得7~20倍的训练速度提升和5~10倍的识别速度提升.
A fast and efficient acoustic feature super vector generation method was proposed to effectively improve the recognition accuracy and speed yielded by traditional frame based acoustic features.This paper makes 3contributions:firstly,certain number of acoustic feature vectors extracted from continuous audio frames was combined to be an acoustic feature image;secondly,AdaBoost.MH algorithm was used to select higher representative 2D-Haar pattern combinations to construct super feature vectors;thirdly,random feature selection method was proposed to further improve the processing speed.Experimental results show that under 3kinds of audio recognition occasions such as audio events recognition,speaker recognition,speaker gender recognition,the use of 2D-Haar acoustic feature super vector can make SVM,C5.0,AdaBoost algorithms obtain higher recognition accuracy than ones that MFCC,PLP,LPCC and other traditional acoustic features yielded,and can make the training processing 7~20times faster and the recognition processing 5~10times faster.
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2016年第3期295-301,共7页
Transactions of Beijing Institute of Technology
基金
国家"二四二"计划项目(2005C48)
北京理工大学科技创新计划项目(2011CX01015)