基于听觉特性和发声特性的语种识别

Language identification based on auditory and vocal characteristics

下载PDF

导出

摘要针对现有的方法在低信噪比环境下语种识别性能不佳,提出了一种耳蜗滤波系数和声道冲激响应频谱参数相互融合的语种识别方法.该方法表征了人的耳蜗听觉特性和发声特性,首先提取模拟人耳听觉特性的耳蜗滤波系数,再融合表征人的发声特性的声道冲激响应频谱参数,最后采用高斯混合通用背景模型对所提方法在语种识别上进行测试.实验结果表明,在4种信噪比环境下,该方法优于其他对比方法;相对于基于深度学习的对数Mel尺度滤波器能量特征,识别正确率提升了16.1%,与其他方法相比有较大程度的提升. Aiming at the poor performance of the existing methods in language identification in the low signal-to-noise ratio environment,a language identification method is proposed,which integrates the cochlear filter coefficients and the spectral parameters of the vocal tract impulse response.This method characterizes human vocalization characteristics and human hearing characteristics.Firstly,the cochlear filter coefficients that simulate the auditory characteristics of the human ear are fused.Then the spectral parameters of the vocal tract impulse response that characterize the characteristics of human vocalization are extracted.Finally,the Gaussian mixture general background model is used to test the proposed method in language identification.The experimental results show that in the four signal-to-noise ratio environments,this method is superior to other comparison methods.Compared with the logarithmic Mel-scale filter energy feature based on deep learning,the identification accuracy is improved by 16.1%,which is also very good compared to other methods.

作者华英杰朵琳刘晶邵玉斌 HUA Ying-jie;DUO Lin;LIU Jing;SHAO Yu-bin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)

机构地区昆明理工大学信息工程与自动化学院

出处《云南大学学报（自然科学版）》 CAS CSCD 北大核心 2023年第4期807-814,共8页 Journal of Yunnan University(Natural Sciences Edition)

基金国家自然科学基金(61962032) 云南省科技厅优秀青年项目(202001AW07000).

关键词语种识别耳蜗滤波系数声道冲激响应频谱参数高斯混合通用背景模型 language identification cochlear filter coefficient vocal tract impulse response spectral parameters Gaussian mixture general background model

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献4

1白静,史燕燕,薛珮芸,郭倩岩.融合非线性幂函数和谱减法的CFCC特征提取[J].西安电子科技大学学报,2019,46(1):86-92. 被引量：11
2邵玉斌,刘晶,龙华,杜庆治,李一民.基于声道频谱参数的语种识别[J].北京邮电大学学报,2021,44(3):112-119. 被引量：11
3刘晶,邵玉斌,龙华,李一民.基于GFCC和能量算子倒谱的语种识别[J].云南大学学报（自然科学版）,2022,44(2):254-261. 被引量：3
4张卫强,刘加.基于听感知特征的语种识别[J].清华大学学报（自然科学版）,2009(1):78-81. 被引量：21

二级参考文献31

1岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015,32(6):163-166. 被引量：5
2Zissman M A. Comparison of four approaches to automatic language identification of telephone speech [J]. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31 - 44.
3Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification [J]. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271 - 284.
4Huang X D, Acero A, Hon H W. Spoken Language Processing [M]. Upper Saddle River, NJ: Prentice Hall PTR, 2000.
5Abdulla W H. Auditory based feature vectors for speech recognition systems [J]. Advances in Communications and Software Technologies, 2002: 231- 236.
6Li Q, Soong F, Siohan O. A high-performance auditory feature for robust speeeh recognition [C]//Proe 6th Int Conf on Spoken Language Processing. Beijing: China Military Friendship Publish, 2000, Ⅲ: 51- 54.
7Colombi J M, Anderson T R, Rogers S K. Auditory model representation for speaker recognition [C]//Proc ICASSP. Piscataway, NJ: IEEE Press, 2006, Ⅱ:700-703.
8Glasberg B R, Moore B C. Derivation of auditory filter shapes from notched-noise data [J]. Hearing Research, 1990, 47(1-2): 103-108.
9Slaney M. An efficient implementation of the Patterson-Holdsworth auditory filter bank [R]. Apple Computer Inc, 1993.
10Aertsen A M, Johannesma P I, Hermes D J. Spectro-temporal receptive fields of auditory neurons in the grassfrog [J]. Biological Cybernetics, 1980, 38(4) : 235 - 248.

共引文献34

1卢小春,尹俊勋,王修信.基于听觉模型特征的与文本无关说话人识别系统[J].广西师范大学学报（自然科学版）,2010,28(2):22-26. 被引量：2
2黄山奇,张连海,屈丹.一种基于人耳听觉感知和子带补偿滤波的鲁棒语言辨识特征参数提取算法[J].模式识别与人工智能,2012,25(1):166-171. 被引量：2
3翟慧强,张金萍,王丹,赵艳春.听觉模型综述[J].机械工程师,2014(3):19-22. 被引量：5
4蒋毅,刘润生,冯振明.基于听感知特性的双麦克风近讲语音增强算法[J].清华大学学报（自然科学版）,2014,54(9):1179-1183. 被引量：1
5张卫强,郭璁,张乔,康健,何亮,刘加,Johnson Michael T.一种基于计算听觉场景分析的语音增强算法[J].天津大学学报（自然科学与工程技术版）,2015,48(8):663-669. 被引量：2
6刘双君,金小峰,崔荣一.基于基频的朝鲜语方言辨识方法的研究[J].中文信息学报,2017,31(2):55-60. 被引量：5
7吴锦晶.网络多媒体分析系统的分析与设计[J].电脑知识与技术,2019,15(5Z):189-190. 被引量：1
8王晓华,要鹏超,马丽萍,王文杰,张蕾.车间环境下机器人语音控制的特征提取算法[J].西安电子科技大学学报,2020,47(2):16-22. 被引量：2
9曾金芳,徐文涛,黄费贞.基于耳蜗倒谱系数的说话人识别[J].电子技术与软件工程,2020(5):85-86. 被引量：1
10张莉,李文钧,岳克强.基于自适应参数的多窗谱谱减法算法改进[J].软件导刊,2020,19(5):74-77. 被引量：3

1黄林然,罗海泉,赵佳敏,王齐胜,龚岩峰.基于电弧声MFCC特征的焊接线能量检测与识别[J].南昌航空大学学报（自然科学版）,2020,34(4):58-65. 被引量：3
2邵玉斌,刘晶,龙华,杜庆治,李一民.基于声道频谱参数的语种识别[J].北京邮电大学学报,2021,44(3):112-119. 被引量：11
3唐依静,周成成,王鹏,孙世凡,吴文琪.基于MFCC的语音篡改检测系统实现[J].现代信息科技,2022,6(17):85-89.
4魏鹏,袁梦.基于自然驾驶数据的跟车危险场景构建[J].中国汽车,2023(7):39-44. 被引量：1
5李达,张普琛,林倪,张照生,王震坡,邓钧君.基于多模型耦合的电动汽车三电系统安全性估计方法[J].机械工程学报,2023,59(12):354-363.
6丁洋,蔡成颖.基于多特征融合的瞌睡检测与预警系统[J].自动化与仪表,2023,38(8):115-119.
7李全中,申建,胡海洋,吉小峰.煤层气井储层地质工程特征对产能的控制研究[J].高校地质学报,2023,29(4):644-656. 被引量：3

云南大学学报（自然科学版）

2023年第4期

浏览历史

内容加载中请稍等...

基于听觉特性和发声特性的语种识别

参考文献4

二级参考文献31

共引文献34

相关作者

相关机构

相关主题

浏览历史