期刊文献+

鸡尾酒会问题与相关听觉模型的研究现状与展望 被引量:20

Research Advances and Perspectives on the Cocktail Party Problem and Related Auditory Models
下载PDF
导出
摘要 近些年,随着电子设备和人工智能技术的飞速发展,人机语音交互的重要性日益凸显.然而,由于干扰声源的存在,在鸡尾酒会等复杂开放环境下的语音交互技术远没有达到令人满意的程度.现阶段,开发一个具备较强自适应性和鲁棒性的听觉计算系统仍然是一件极具挑战性的任务.因此,鸡尾酒会问题的深入探索对智能语音处理领域中的说话人识别、语音识别、关键词唤醒等一系列重要任务都具有非常重要的研究意义和应用价值.本文综述了鸡尾酒会问题相关听觉模型研究的现状与展望.在简要介绍了听觉机理的相关研究,并概括了解决鸡尾酒会问题的多说话人语音分离相关计算模型之后,本文还讨论了受听觉认知机理启发的听觉注意建模方法,认为融入声纹记忆和注意选择的听觉模型在复杂的听觉环境下具有更好的适应性.之后,本文简单回顾了近期的多说话人语音识别模型.最后,本文讨论了目前各类计算模型用于处理鸡尾酒会问题时遇到的困难和挑战,并对未来的研究方向进行了展望. With the rapid development of electronic devices and artificial intelligence technologies, speech-based humanmachine interaction has become increasingly prominent in recent years. However, the performance of these technologies in open complex environments, such as in the cocktail parties, is far from satisfactory. It is still a very challenging task to develop a computational auditory system with strong adaptivity and robustness at present. Therefore, the in-depth exploration of cocktail party problem plays an important role in the tasks of the intellectual speech processing field,such as speaker recognition, speech recognition, keyword spotting and so on. This paper reviews the auditory models related to the cocktail party problem and their developments. We first briefly introduce some relevant hearing research and computational models attacking the multi-speaker speech separation task for solving the cocktail party problem.Then we discuss the auditory attention modeling method inspired by cognitive science. We believe that the auditory model integrated with the memory of voiceprint information and selective attention is more suitable for complex auditory environments. Afterwards, we briefly review current works of multi-speaker speech recognition. Finally, the difficulties and challenges that the current computational models are confronted with are discussed and we give some views on the future research.
作者 黄雅婷 石晶 许家铭 徐波 HUANG Ya-Ting;SHI Jing;XU Jia-Ming;XU Bo(Institute of Automation,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;Center for Excellence in Brain Science and Intelligence Technology,CAS,Shanghai 200031)
出处 《自动化学报》 EI CSCD 北大核心 2019年第2期234-251,共18页 Acta Automatica Sinica
基金 国家自然科学基金(61602479) 中国科学院战略性先导科技专项(XDBS01070000) 北京市科技重大专项(Z181100001518006)资助~~
关键词 鸡尾酒会问题 听觉模型 语音分离 听觉注意 语音识别 Cocktail party problem auditory model speech separation auditory attention speech recognition
  • 相关文献

参考文献1

二级参考文献66

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献66

同被引文献79

引证文献20

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部