摘要
【目的】本文对语音识别系统的主流技术框架及主要挑战进行了系统而全面的介绍,为语音识别领域的进一步技术研究提供参考。【方法】首先,介绍了端到端语音识别框架的主流方案;然后,提出了语音识别应用中的四大挑战性问题,即恶劣场景的识别问题、中英文混合识别问题、专业术语的识别问题以及低资源小语种识别问题。【结果】针对端到端框架稳定性不足的问题,提出了带有强化和过滤注意力机制的改进方案。针对语音识别中的挑战性难题,探讨了主流的解决方案及未来的发展方向。【结论】端到端框架的大规模商用仍存在较大挑战,四大挑战性问题的解决将对语音识别的行业应用推广起到关键的作用。
[Objective]This paper firstly introduces the start-of-art technical framework and main challenges of Automatic Speech Recognition(ASR)systems,then provides reference for further research in the field of ASR.[Methods]Firstly,the newest framework of end-to-end speech recognition is introduced,including the Connectionist Temporal Classification(CTC)and attention based framework.Secondly,four challenging problems in ASR applications are presented,including the recognition of noisy and distant field speech,the recognition of code-switching,the recognition of domain related terms,and minority language speech recognition with limited resources.[Results]For the problem of robustness of end-to-end ASR system,an improved enhancement method and filtering attention mechanism is proposed.The start-of-art methods and future development directions are discussed regarding to the challenging problems of ASR systems.[Conclusions]There is a major challenge for the commercialization of the end-to-end ASR systems,and the research on four challenging problems plays a key role in the application of ASR systems.
作者
刘庆峰
高建清
万根顺
Liu Qingfeng;Gao Jianqing;Wan Genshun(IFLYTEK,Hefei,Anhui 230088,China)
出处
《数据与计算发展前沿》
2019年第2期26-36,共11页
Frontiers of Data & Computing
关键词
语音识别
端到端
远场识别
中英文混合
专业术语
automatic speech recognition
end-to-end
distant filed speech
code-switch
domain related terms