摘要
随着待识别人数的增加,文本无关的说话人识别准确率下降明显.针对这一问题提出了一种高准确率大规模说话人识别方法,该方法采用多个连续音频帧的声学帧特征构成声学特征图,进而获得高维度的2D-Haar声学特征,为训练出性能更优的分类器提供可能;再利用AdaBoost.MH算法筛选出具有较好区分度的2D-Haar声学特征组合进行分类器训练.实验结果表明,600人规模下的正确识别率为89.5%,100~600人规模下的平均准确率为91.3%.该方法适用于大规模说话人的识别,引入的2D-Haar声学特征有效,识别准确率高.此外,该方法还具有较低的算法复杂度和较高的时间效率.
When we use the text-independent speaker recognition technology, the recognition accuracy degrades significantly as the number of target speakers increases. In order to improve the accuracy,a high accuracy large-scale speaker recognition method was proposed. This method combined certain number of continuous audio frames to be an acoustic feature figure, and then got the high-dimension 2D-Haar acoustic feature, which provide more probabilities to train a better classifier; AdaBoost. MH algorithm was employed to find out efficient 2D-Haar acoustic feature combination for classifier training. The experimental results show that recognition rate is 89.5% when the number of target speakers is 600, and average rate is 91.3% when the number of target speakers increases from 100 to 600. This method is suitable for large-scale speaker recognition and 2D-Haar acoustic feature is effective to yield higher performance. In addition, this method also has low algorithm complexity and time consumption.
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2014年第11期1196-1201,共6页
Transactions of Beijing Institute of Technology
基金
国家242计划基金资助项目(2005C48)
北京理工大学科技创新计划(2011CX01015)