期刊文献+

基于特征分量输出概率加权的多数据流鲁棒语音识别方法 被引量:2

Robust multi-stream speech recognition based on weighting the output probabilities of feature components
原文传递
导出
摘要 针对传统多数据流语音识别方法不考虑数据流内各特征分量受噪声影响差异的缺点,提出了一种基于特征分量输出概率加权的数据流结合新方法,分析了特征分量输出概率加权对识别的影响,并结合丢失数据技术中的边缘化(Marginalisation)模型和软判决(Soft decision)模型给出了两种具体的数据流结合方案。将所提数据流结合方案应用到复合子带语音识别系统中,实验结果表明,所提识别方法可以根据噪声环境的不同自适应地调整数据流对识别影响的大小,其性能显著优于传统的多数据流识别方法。 Traditional multi-stream fusion methods in speech recognition try to control the stream influences on the decision by weighting the stream outputs. This paper proposes a new stream fusion method which weights not only the stream outputs, but also the output probabilities of feature components. The effect of the new fusion method on stream influences on the decision is discussed and two stream fusion schemes based on the mariginalisation and soft decision models in missing data techniques are also proposed. Experimental results on hybrid sub-band speech recognizer show that the proposed approaches can adjust the stream influences adaptively and outperform the traditional multi-stream methods in various noisy environments.
出处 《声学学报》 EI CSCD 北大核心 2008年第2期102-108,共7页 Acta Acustica
基金 国家自然科学青年基金(60502041) 广东省自然科学博士启动基金(05300146)资助项目
关键词 语音识别系统 多数据流 识别方法 特征分量 加权 概率 输出 鲁棒 Data flow analysis Feature extraction Hidden Markov models Probability
  • 相关文献

参考文献15

  • 1Lim W, Kim N S. Feature compensation incorporating modeling error statistics. IEEE Signal Processing Letters, 2007; 14(7): 492-495
  • 2赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量:11
  • 3Hazen T J. Visual model structures and synchrony constraints for audio-visual speech recognition. IEEE Transactions on Audio, speech and Language Processing, 2006; 14(3): 1082-1089
  • 4谢磊,付中华,蒋冬梅,赵荣椿,Werner Verhelst,Hichem Sahli,Jan Conlenis.一种稳健的基于VisemicLDA的口形动态特征及听视觉语音识别[J].电子与信息学报,2005,27(1):64-68. 被引量:4
  • 5徐彦君,杜利民,李国强,张欣,周治.汉语听觉视觉双模态数据库CAVSR1.0[J].声学学报,2000,25(1):42-49. 被引量:16
  • 6Poh N, Bengio S. Why do multi-stream, multi-band and multi-modal approaches work on biometric user authentication tasks. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004; 5:893-896
  • 7Hetherington I L, Han Shu, Glass J R. Flexible Multi-Stream Framework for Speech Recognition using Multi-Tape Finite-State Transducers. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2006; 1:417-420
  • 8Cooke M, Green P, Josifovski L, Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 2001; 34: 267-285
  • 9Zhang J, Kwong S, Wei G, Hong Q Y. Using Mel-frequency cepstral coefficients in missing data technique. EURASIP Journal on Applied Signal Processing, 2004; 8(1): 340-346
  • 10Barker J P, Josifovski L, Cooke M P, Green P. Soft decisions in missing data techniques for robust automatic speech recognition. Proc.ICSLP'00, 2000; 1:373-376

二级参考文献37

  • 1齐士钤 吕士楠 等.汉语综合资料库的设计[J].应用声学,1994,13(3):1-5.
  • 2朱维彬.汉语言语数据库自动标注系统的研究.中国科学院声学研究所博士论文[M].,1998..
  • 3林茂灿.北京话声调分布域的感知实验研究.语音研究报告[M].中国社会科学院语言研究所语音研究室,1992..
  • 4Potamianos G, Neti C, et al.. Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE, 2003, 91(9):1306- 1326.
  • 5Cootes T F, Taylor C J, et al., Active shape models-their training and application, Computer Vision and linage Understanding,1995, 12(1): 38 - 59.
  • 6Neti C, Potamianos G, Luettin J, et al.. Audio visual speech recognition. Final Workshop 2000 Report, Baltimore, USA, 2000:40- 41.
  • 7Rao C R, Linear Statistical Inference and Its Applications. New York, John Wiley and Sons, 1965:122 - 128.
  • 8Young S J, Kershaw D, Odell J, Woodland P. The HTK Book.http://htk.eng,cam.ac.uk/docs/docs.shtml, 2002.
  • 9Dupont S, Luettin J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2000,2(3): 141 - 151.
  • 10朱维彬,博士学位论文,1998年

共引文献29

同被引文献27

  • 1陈锴,卢晶,徐柏龄.基于话者状态检测的自适应语音分离方法的研究[J].声学学报,2006,31(3):211-216. 被引量:3
  • 2王欢良,韩纪庆,李海峰.基于特征似然度加权和维数缩减的Robust语音端点检测[J].声学学报,2007,32(1):62-68. 被引量:7
  • 3GUO Yanmeng FU Qiang YAN Yonghong.Speech endpoint detection in real noise environments[J].Chinese Journal of Acoustics,2007,26(1):39-48. 被引量:5
  • 4严斌峰,朱小燕,张智江,张范.基于邻接空间的鲁棒语音识别方法[J].软件学报,2007,18(4):878-883. 被引量:5
  • 5Elif B, Erzin E, Eroglu E C et al. Improving automatic emotion recognition from speech signals. 10th Annum Con- ference of the International Speech Communication Asso- ciation (Brighton, United kingdom, September 6-10, 2009), 2009:324- 327.
  • 6Yang B, Lugger M. Emotion recognition from speech sig- nals using new harmony features. Signal Processing, 2010; 90(5): 1415-1423.
  • 7Kim E H, Hyun K H, Kim S H et al. Improved emo- tion recognition with a novel speaker-independent feature. IEEE Trans. on Mechatronics, 2009; 14(3): 317-325.
  • 8Park J S, Kim J H, Oh Y H. Feature vector classification based speech emotion recognition for service robots. IEEE Trans. on Consumer Electronics, 2009; 55(3): 1590-1596.
  • 9Bitouk D, Verma R, Nenkova A. Class-level spectral fea- tures for emotion recognition. Speech Communication, 2010; 52(7-8): 613-625.
  • 10Suryannarayana C, Amitava C, Sugata M. Support vector machines employing cross-correlation for emotional speech recognition. Journal of the International Measurement Confederation, 2009; 42(4): 611-618.

引证文献2

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部