摘要
压缩域说话人识别算法(Compressed-domain automatic speaker recognition,CD-ASR)即从压缩语音数据中直接提取压缩参数进行说话人识别,无需参数译码和波形合成。本文提出了基于概率统计直方图的VoIP压缩域说话人识别算法,包括矢量量化统计直方图和高斯混合模型统计直方图两种方法。在给出了G.729,G.723.1(6.3 kb/s),G.723.1(5.3 kb/s)压缩码流的压缩域特征提取方案后,分别以矢量量化统计直方图和高斯混合模型统计直方图作为识别模型进行说话人识别。实验结果表明,概率统计直方图法比在压缩码流中提取同样识别参数的GMM模型,识别率有很大提高。
Compressed-domain automatic speaker recognition (CD-ASR) extracts features directly from the coded speech bit-stream to avoid decoding the parameters and resynthesizing the speech waveform. In this paper, a compressed-domain speaker recognition approach is pro- posed based on the probabilistic stochastic histogram. Firstly, the compressed-domain feature extraction schemes of G. 729,G. 723.1 (6.3 kb/s), G723.1(5.3 kb/s) compressed bit streams are described. Then, the speaker recognition algorithms are presented based on vector quantization probabilistic stochastic histogram (VQPSH) and Gaussian mixture model probabilistic stochastic histogram(GMMPSH). Experimental results show that the probabilistic stochastic histogram algorithm is superior to classical GMM when using the same compressed-domain feature extraction algorithms.
出处
《数据采集与处理》
CSCD
北大核心
2009年第5期594-599,共6页
Journal of Data Acquisition and Processing
基金
国家"八六三"高技术研究发展计划(2006AA01Z146)资助项目
关键词
压缩域说话人识别
矢量量化概率统计直方图
高斯混合模型概率统计直方图
compressed-domain automatic speaker recognition (CD-ASR)
vector quantization probabilistic stochastic histogram(VQPSH)
Gaussian mixture model probabilistic stochastic histogram (GMMPSH)