基于概率统计直方图的压缩域说话人识别

Compressed-Domain Automatic Speaker Recognition Based on Probabilistic Stochastic Histogram

下载PDF

导出

摘要压缩域说话人识别算法(Compressed-domain automatic speaker recognition,CD-ASR)即从压缩语音数据中直接提取压缩参数进行说话人识别,无需参数译码和波形合成。本文提出了基于概率统计直方图的VoIP压缩域说话人识别算法,包括矢量量化统计直方图和高斯混合模型统计直方图两种方法。在给出了G.729,G.723.1(6.3 kb/s),G.723.1(5.3 kb/s)压缩码流的压缩域特征提取方案后,分别以矢量量化统计直方图和高斯混合模型统计直方图作为识别模型进行说话人识别。实验结果表明,概率统计直方图法比在压缩码流中提取同样识别参数的GMM模型,识别率有很大提高。 Compressed-domain automatic speaker recognition （CD-ASR） extracts features directly from the coded speech bit-stream to avoid decoding the parameters and resynthesizing the speech waveform. In this paper, a compressed-domain speaker recognition approach is pro- posed based on the probabilistic stochastic histogram. Firstly, the compressed-domain feature extraction schemes of G. 729,G. 723.1 （6.3 kb/s）, G723.1（5.3 kb/s） compressed bit streams are described. Then, the speaker recognition algorithms are presented based on vector quantization probabilistic stochastic histogram （VQPSH） and Gaussian mixture model probabilistic stochastic histogram（GMMPSH）. Experimental results show that the probabilistic stochastic histogram algorithm is superior to classical GMM when using the same compressed-domain feature extraction algorithms.

作者屈丹闫红刚唐晖王炳锡

机构地区解放军信息工程大学信息工程学院

出处《数据采集与处理》 CSCD 北大核心 2009年第5期594-599,共6页 Journal of Data Acquisition and Processing

基金国家"八六三"高技术研究发展计划(2006AA01Z146)资助项目

关键词压缩域说话人识别矢量量化概率统计直方图高斯混合模型概率统计直方图 compressed-domain automatic speaker recognition （CD-ASR） vector quantization probabilistic stochastic histogram（VQPSH） Gaussian mixture model probabilistic stochastic histogram （GMMPSH）

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献9

1Petracca M,Servetti A, Demartin J C. Performance analysis of compressed-domain automatic speaker recognition as a function of speech coding technique and bit rate [C]//Proceedings of International Conference on Multimedia and Expo (ICME). Toronto, Canada:IEEE Press,2006:1393-1396.
2Dunn R B, Quatieri T F, Reynolds D A, et al. Speaker recognition from coded speech in matched and mismatched conditions [C]//Proceedings of Speaker Recognition Workshop'1. Grete, Greece: [s.n.], 2001: 115-120.
3Quatieri T F, Dunn R B, Reynolds D A, et al. Speaker recognition using G. 729 speech codec parameters [C]//Proceedings of IEEE, International Conference on Audio, Speech and Signal Processing. Istanbul, Turkey:IEEE Press, 2000: 1089-1093.
4Aggarwal C C, Olshefski D, Saha D, et al. CSR: speaker recognition from compressed VoIP packet stream[C]//Proceedings of International Conference on Multimedia and Expo (ICME). Amsterdam, Holand : IEEE Press, 2005 : 970-973.
5Petracca M, Servetti A, Demartin J C. Low-complextity automatic speaker recognition in the compressed GSM-AMR domain[C]//Proceedings of International Conference on Multimedia and Expo (ICME). Amsterdam, Holand:IEEE Press, 2005: 662-665.
6ITU-T H. 323 2000. Packet-based multimedia communications systems[S]. Genevese: ITU-T,2000.
7ITU-T Recommendation G. 729-1996. Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP)[S]. Helsinki.. WTSC Resolution, 1996.
8ITU-T Recommendation G. 723.1-1996. Dual rate speech coder for multimedia communications trans- mitting at 5.3 and 6.3 kbit/s [S]. Helsinki: WTSC Resolution, 1996.
9屈丹,王炳锡,魏鑫.基于GMM-UBM模型的语言辨识研究[J].信号处理,2003,19(1):85-88. 被引量：10

二级参考文献11

1Y. K. Muthusamy, E. Barnard and R. A. Cole, "Reviewing Automatic Language Identification", IEEE Signal Processing Magazine, October 1994.
2Berkling, K.M., Arai, T., Barnard, E., Cole, R.A., 1994.Analysis of phoneme-based features for language identification. In: International Conference on Acoustics,Speech, and Signal Processing, Vol. 1, Aprikl 1994, pp.289-292.
3M.A. Zissman. Language identification using phoneme recognition phonotactic language modeling. In Proceedings 1995 IEEE International Conference onAcoustics,Speech, and Signal Processing, pages 3503- 3506, May 1995.
4J. Narvratil and Wemer Zuhlke. Double bigramdecoding in Phonotactic language identification. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 97, Munique,Germany, April 1997.
5Y. K. Muthusamy, R. A. Cole, and B. T. Oshika. The OGI Multi-language telephone speech corpus. Technical report,Center for Spoken Language Understanding Oregon Graduate Institute of Science and Technology, Portland,1993.
6D.A. Reynolds, T. E Quaffed, and R. B. Dunn. Speaker verification using adapted Gaussian mixture models.Digital Signal Processing, Vol. 10, pp 19-41, 2000.
7D.A. Reynolds, and R.C. Rose, Rosust text-independence speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, vol.3, No. 1, pp72-83.
8A. E. Rosenberg and S. Parthasarathy, Speaker background models for connected digit password speaker verification. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing,pp 81-84, 1996
9J. L. Gauvain and C.H. Lee, Maximum a postedori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. Speech Audio Process.Vol.2, pp 291-298,1994.
10M. A. Zissman, "Comparison of four approaches to automatic language identification of telephone speech",IEEE Trans. Speech Audio Process. Vol. 4, pp 31-44.

共引文献9

1张凡,贺苏宁.模糊判决支持向量机在自动语种辨识中的研究[J].计算机工程与应用,2004,40(21):69-71.
2屈丹,侯风雷,王炳锡,吴保民.基于说话人聚类和高斯混合模型的语言辨识研究[J].信号处理,2004,20(3):285-289.
3张强,屈丹,侯风雷,王炳锡.应用说话人聚类技术改善语言辨识系统识别率[J].电声技术,2007,31(3):44-48.
4顾明亮.一种新的汉语方言辨识特征[J].广西科学,2007,14(4):423-425.
5陈业仙,张歆奕,毛杰.基于GMM-UBM的语言辨识算法研究[J].五邑大学学报（自然科学版）,2010,24(3):56-60.
6顾明亮,张彪.半监督矢量量化的汉语方言辨识[J].计算机工程与应用,2011,47(33):109-111. 被引量：1
7韩军.基于DBF的汉语方言自动辨识[J].电声技术,2017,41(4):120-124. 被引量：2
8周大春,邵玉斌,张昊阁,龙华,彭艺.应用于噪声环境下语种识别的GFCC改进算法[J].云南大学学报（自然科学版）,2024,46(2):246-254.
9屈丹,王炳锡.基于GMBM-UBBM模型的语言辨识研究[J].计算机工程与应用,2004,40(3):29-32.

1丁玉国,梁维谦,刘加,刘润生.一种应用于嵌入式语音识别的端点检测方法[J].计算机应用研究,2006,23(4):193-195. 被引量：5
2陈小余,蒋丽珍,周云,邬良能.量子高斯态的纠缠原因分析[J].量子光学学报,2004,10(B09):33-33.
3刘春雷,贾金锁.基于瞬时相位的跳频信号检测方法[J].舰船电子对抗,2008,31(6):83-84. 被引量：2
4唐晖,李弼程,屈丹,张连海.VoIP压缩码流说话人识别研究[J].计算机工程,2009,35(7):180-182. 被引量：2
5汤素华,尹华锐,徐佩霞.一种多参数联合判决的调制识别方法[J].数据采集与处理,2003,18(1):27-31. 被引量：7
6盛文,邓斌,柳健.一种基于多尺度距离像的红外小目标检测方法[J].电子学报,2002,30(1):42-45. 被引量：30
7赵建保,徐献灵.H.264视频压缩参数设置的应用研究[J].影视制作,2012(12):46-49.
8贾可新,何子述.一种基于改进K均值算法的跳频信号分选方法[J].计算机应用研究,2011,28(6):2333-2335. 被引量：1
9白俊奇,陈钱,王娴雅,钱惟贤.红外图像噪声滤波对比度增强算法[J].红外与激光工程,2010,39(4):777-780. 被引量：10
10高彦钊,占荣辉.基于粒子群优化算法的KK分布参数估计方法[J].系统工程与电子技术,2013,35(12):2495-2500. 被引量：6

数据采集与处理

2009年第5期

浏览历史

内容加载中请稍等...

基于概率统计直方图的压缩域说话人识别

参考文献9

二级参考文献11

共引文献9

相关作者

相关机构

相关主题

浏览历史