摘要
针对文本无关非特定说话人年龄识别,本文提出了一种基于有效频带多分辨率特征的统计分析识别方法。输入语音,通过小波包变换进行有效频带分解,然后将各有效频带的小波包系数连接构成一个整体计算美尔频率倒谱系数,得到有效频带多分辨率特征参数WPMFC(Wavelet Packet Mel-Frequency Cepstrum),说话人按年龄划分为儿童、青年、中年和老年四个阶段,并进一步按性别训练各年龄段语音得到8个高斯混合模型。测试语音依据最大似然准则进行识别判决。实验对本文提出的方法与传统的短时谱统计分析方法进行了比较,结果显示本文提出的方法有较好的识别性能,集内平均识别率达到65.17%。同时,实验结果也说明相对语音文本变化的影响,不同说话人发音特征的变化对识别性能的影响更大。
For speaker and text independent age recognition, a new multi-resolution feature extraction algorithm is pro- posed. The input speech is decomposed by wavelet packet transform, and then the wavelet packet coefficients of each effec- tive frequency band are connected to form a intermediate signal for further calculating of its Mel-frequency cepstrum coeffi- cients which is called Wavelet Packet Mel-Frequency Cepstrum Coefficient (WPMFC). The speaker age is divided into four age groups such as children, youths, adult and older, and totally eight Gaussian mixture models are trained for each age group and gender. Testing speech recognition decision is based on maximum likelihood criterion. The results of experi- mental prove that the performance of age recognition based on proposed feature extraction algorithm is successful compared with traditional short time spectral statistical analysis methods, the average recognition rate of outset speaker age reached 65.17%. What's more, comparing with the influence of the change of the voice content, the change of the characteristics of the speaker' s pronunciation has more influence on the recognition performance.
出处
《信号处理》
CSCD
北大核心
2016年第9期1101-1107,共7页
Journal of Signal Processing
关键词
说话人年龄识别
有效频带
多分辨率特征
小波包变换
speaker age recognition
effective frequency bands
multi-resolution features
wavelet packet transform