Robust multi-stream speech recognition based on weighting the output probabilities of feature components 被引量：4

Robust multi-stream speech recognition based on weighting the output probabilities of feature components

导出

摘要 In the traditional multi-stream fusion methods of speech recognition, all the feature components in a data stream share the same stream weight, while their distortion levels are usually different when the speech recognizer works in noisy environments. To overcome this limitation of the traditional multi-stream frameworks, the current study proposes a new stream fusion method that weights not only the stream outputs, but also the output probabilities of feature components. How the stream and feature component weights in the new fusion method affect the decision is analyzed and two stream fusion schemes based on the mariginalisation and soft decision models in the missing data techniques are proposed. Experimental results on the hybrid sub-band multi-stream speech recognizer show that the proposed schemes can adjust the stream influences on the decision adaptively and outperform the traditional multi-stream methods in various noisy environments. In the traditional multi-stream fusion methods of speech recognition, all the feature components in a data stream share the same stream weight, while their distortion levels are usually different when the speech recognizer works in noisy environments. To overcome this limitation of the traditional multi-stream frameworks, the current study proposes a new stream fusion method that weights not only the stream outputs, but also the output probabilities of feature components. How the stream and feature component weights in the new fusion method affect the decision is analyzed and two stream fusion schemes based on the mariginalisation and soft decision models in the missing data techniques are proposed. Experimental results on the hybrid sub-band multi-stream speech recognizer show that the proposed schemes can adjust the stream influences on the decision adaptively and outperform the traditional multi-stream methods in various noisy environments.

作者 ZHANG Jun WEI Gang YU Hua NING Genxin

机构地区 College of Electronic & Information Engineering

出处《Chinese Journal of Acoustics》 2009年第3期269-279,共11页 声学学报（英文版）

基金 supported by the National Natural Science Foundation of China(60502041,60625101) Guangdong National Science Foundation(05300146).

分类号 TP317.1 [自动化与计算机技术—计算机软件与理论] TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献5

1张军,韦岗.噪声自适应的多数据流复合子带语音识别方法[J].电子与信息学报,2006,28(7):1183-1187. 被引量：3
2徐彦君,杜利民,李国强,张欣,周治.汉语听觉视觉双模态数据库CAVSR1.0[J].声学学报,2000,25(1):42-49. 被引量：16
3蒋文建,韦岗.噪声下差分复合子带语音识别方法[J].通信学报,2002,23(1):18-24. 被引量：4
4谢磊,付中华,蒋冬梅,赵荣椿,Werner Verhelst,Hichem Sahli,Jan Conlenis.一种稳健的基于VisemicLDA的口形动态特征及听视觉语音识别[J].电子与信息学报,2005,27(1):64-68. 被引量：4
5赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量：11

二级参考文献37

1齐士钤吕士楠等.汉语综合资料库的设计[J].应用声学,1994,13(3):1-5.
2朱维彬.汉语言语数据库自动标注系统的研究.中国科学院声学研究所博士论文[M].,1998..
3林茂灿.北京话声调分布域的感知实验研究.语音研究报告[M].中国社会科学院语言研究所语音研究室,1992..
4Potamianos G, Neti C, et al.. Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE, 2003, 91(9):1306- 1326.
5Cootes T F, Taylor C J, et al., Active shape models-their training and application, Computer Vision and linage Understanding,1995, 12(1): 38 - 59.
6Neti C, Potamianos G, Luettin J, et al.. Audio visual speech recognition. Final Workshop 2000 Report, Baltimore, USA, 2000:40- 41.
7Rao C R, Linear Statistical Inference and Its Applications. New York, John Wiley and Sons, 1965:122 - 128.
8Young S J, Kershaw D, Odell J, Woodland P. The HTK Book.http://htk.eng,cam.ac.uk/docs/docs.shtml, 2002.
9Dupont S, Luettin J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2000,2(3): 141 - 151.
10朱维彬，博士学位论文，1998年

共引文献29

1洪晓鹏,姚鸿勋,徐铭辉.基于句子级的唇读语料库及其切分算法[J].计算机工程与应用,2005,41(3):174-177. 被引量：7
2张军,韦岗.噪声自适应的多数据流复合子带语音识别方法[J].电子与信息学报,2006,28(7):1183-1187. 被引量：3
3张欣,杜利民,陈柯,赵向阳.汉语语音视觉合成研究数据库CVSS1.0[J].微计算机应用,2007,28(3):260-265. 被引量：3
4李刚,王蒙军,林凌.面向残疾人的汉语可视语音数据库[J].中国生物医学工程学报,2007,26(3):355-360. 被引量：3
5王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量：2
6秦伟,韦岗.多数据流隐马尔可夫模型的流权值优化方法[J].计算机应用研究,2007,24(11):100-102.
7马会丽,唐红,赵国锋.电话外呼系统的研究与实现[J].计算机应用,2007,27(9):2343-2345. 被引量：5
8张军,韦岗,余华.基于特征分量输出概率加权的多数据流鲁棒语音识别方法[J].声学学报,2008,33(2):102-108. 被引量：2
9王智国,吴及,戴礼荣,王仁华.一种对加性噪声和信道函数联合补偿的模型估计方法[J].声学学报,2008,33(3):238-243. 被引量：5
10吕国云,赵荣椿,蒋冬梅,H．Sahli,樊养余,W．Verhelst.基于BTSM-LDA的口形动态特征及多流异步音视频语音识别[J].数据采集与处理,2008,23(4):397-403.

同被引文献12

1GONG Y F.Speech Recognition in Noisy Environment:a Survey[J].Speech Communication,1995,16(3):261-291.
2HUANG Hao ZHU Jie.Discriminative tonal feature extraction method in mandarin speech recognition[J].The Journal of China Universities of Posts and Telecommunications,2007,14(4):126-130. 被引量：1
3张震,王化清.语音识别中DTW模型的改进算法研究[J].矿山机械,2008,36(22):30-34. 被引量：1
4关勇,李鹏,刘文举,徐波.基于计算听觉场景分析和语者模型信息的语音识别鲁棒前端研究[J].自动化学报,2009,35(4):410-416. 被引量：2
5曾定,刘加.基于模型融合的母语与非母语语音识别[J].电子测量技术,2009,32(6):81-83. 被引量：3
6王娜,郑德忠,张淑清.基于混沌振子的低信噪比语音端点检测新方法[J].仪器仪表学报,2009,30(7):1432-1435. 被引量：15
7许芬,咸宝金,李正熙.基于产生式规则多传感器数据融合方法的移动机器人避障[J].电子测量与仪器学报,2009,23(10):73-79. 被引量：9
8王珊珊,刘欢.矿用通讯控制系统语音模块的优化设计[J].国外电子测量技术,2010,29(2):71-74. 被引量：11
9张从力,雷蕾,段其昌,周来媛.可用于矿井紧急通讯的无线语音系统研究[J].国外电子测量技术,2010,29(9):43-46. 被引量：8
10邓瑞,肖纯智,高勇.基于MFCC相似度和谱熵的端点检测算法[J].现代电子技术,2013,36(21):67-69. 被引量：6

引证文献4

1魏勋,耿志辉,王晓攀.语音识别的鲁棒性特征提取方法研究[J].无线电工程,2010,40(8):59-61. 被引量：1
2高美娟,杨智鑫,田景文.移动机器人实时语音控制的实现[J].电子测量技术,2011,34(7):50-53. 被引量：12
3曾霞霞,徐戈,吴征远.基于MFCC特征组合参数的说话人识别研究[J].集美大学学报（自然科学版）,2016,21(4):317-320. 被引量：2
4赵峰,徐海青,吴立刚,余江斌,黄影.基于后验知识监督的噪声鲁棒声学模型研究[J].湘潭大学学报（自然科学版）,2018,40(6):98-103.

二级引证文献15

1黎林,朱军.基于小波分析与神经网络的语音端点检测研究[J].电子测量与仪器学报,2013,27(6):528-534. 被引量：26
2李翔,李昕,胡晨,卢夏衍.面向智能机器人的Teager语音情感交互系统设计与实现[J].仪器仪表学报,2013,34(8):1826-1833. 被引量：10
3朱坚民,张雷,翟东婷,雷静桃.基于声音多特征贝叶斯网络融合的话者识别研究[J].仪器仪表学报,2013,34(9):2058-2067. 被引量：14
4毛丽民,卢振利,谢新明,浦宇欢.基于语音交互功能的医疗服务机器人控制系统设计[J].高技术通讯,2014,24(7):745-751. 被引量：6
5张宇波,邢立钊.基于小波分析与PSO-ELM的语音端点检测算法研究[J].中北大学学报（自然科学版）,2016,37(1):33-38. 被引量：4
6张毅,汪培培,罗元.基于MUSIC/MNM谱估计的鲁棒语音特征提取[J].信息与控制,2016,45(3):355-360. 被引量：1
7卢振利,田铠,徐惠钢,张程,李斌,波罗瓦茨·布朗尼斯拉夫,刘军.面向人机对抗赛的语音交互系统设计[J].高技术通讯,2017,27(5):457-463. 被引量：4
8张梁,张方,蒋祺,朱伟.基于模糊控制的轮式机器人振动主动控制[J].国外电子测量技术,2017,36(11):129-133. 被引量：8
9张学祥,雷菊阳.基于DNN与基音周期的说话人识别[J].计算机与现代化,2020,0(1):122-126. 被引量：4
10张超,曹兵.旅游景区服务机器人的应用研究[J].信息系统工程,2020(5):8-9.

1痛并快乐着.让灰鸽子简单绕过杀软[J].网友世界,2010(12):33-33.
2郁静.基于OSPF的单播路由技术在GPON系统中的应用[J].中国电子商务,2012(17):46-46.
3NI FlexRIO将PXI Express引入基于FPGA的自定义仪器[J].测控技术,2010,29(4):73-73.
4WANG Chengyou,TANG Shuqi,LIANG Diannong,CHEN Huihuang and TANG Zhaojing(National University of Defence Technology Changsha 410073)Received.The methods for combining the information of various kinds of features in speech recognition[J].Chinese Journal of Acoustics,1997,16(2):115-120.
5NI FlexRIO系列产品将PXI Express技术引入基于FPGA的自定义仪器[J].电子测量与仪器学报,2010,24(3):300-300.
6陈永煊,鲍鸿,张晶.基于SPEECH SDK的中文学习系统[J].电脑编程技巧与维护,2010(2):34-36. 被引量：1
7SanDisk宣布推出高性能X100固态驱动器[J].微电脑世界,2012(3):16-16.
8wiki.Multi-Streaming让电脑一机两用[J].微型计算机,2006,26(4):114-114.
9Liu Gang Chen Wei Guo Jun.Novel Active Learning Method for Speech Recognition[J].China Communications,2010,7(5):29-39. 被引量：1
10QIAN Yanmin XU Ji LIU Jia.Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR[J].Chinese Journal of Electronics,2013,22(2):291-295. 被引量：2

Chinese Journal of Acoustics

2009年第3期

浏览历史

内容加载中请稍等...