摘要
语音是人机交互方式之一,语音识别技术是人工智能的重要组成部分。近年来神经网络技术在语音识别领域的应用快速发展,已经成为语音识别领域中主流的声学建模技术。然而测试条件中目标说话人语音与训练数据存在差异,导致模型不适配的问题。因此说话人自适应(SA)方法是为了解决说话人差异导致的不匹配问题,研究说话人自适应方法成为语音识别领域的一个热门方向。相比传统语音识别模型中的说话人自适应方法,使用神经网络的语音识别系统中的自适应存在着模型参数庞大,而自适应数据量相对较少等特点,这使得基于神经网络的语音识别系统中的说话人自适应方法成为一个研究难题。首先回顾说话人自适应方法的发展历程和基于神经网络的说话人自适应方法研究遇到的各种问题,其次将说话人自适应方法分为基于特征域和基于模型域的说话人自适应方法并介绍对应原理和改进方法,最后指出说话人自适应方法在语音识别中仍然存在的问题及未来的发展方向。
Speech is one of the ways of human-computer interaction, and speech recognition technology is an important part of artificial intelligence. In recent years, the application of neural network technology in the field of speech recognition has developed rapidly, and it has become the mainstream acoustic modeling technology in the field of speech recognition. However, there is a difference between target speaker ’ s voice and training data in the test conditions, which leads to the problem of model incompatibility. Therefore, the speaker adaptation(SA) method is to solve the mismatch problem caused by the speaker difference, and the research on the speaker adaptation method has become a popular direction in the field of speech recognition. Compared with the speaker adaptation method in the traditional speech recognition system, the self-adaptation in the speech recognition system using neural network has the characteristics of huge model parameters and relatively small amount of data. Therefore, the speaker adaptation method in the neural network-based speech recognition system becomes a challenge. Firstly, this paper reviews the development history of the speaker adaptation method and the various problems encountered in the research of the neural network-based speaker adaptation method. Secondly, the speaker adaptation method is divided into the speaker adaptation method based on feature domain and the speaker adaptation method based on model domain. It also introduces the corresponding principles and improvement methods, and finally points out the problems that still exist in the speaker adaptation method in speech recognition and the future development direction.
作者
朱方圆
马志强
陈艳
张晓旭
王洪彬
宝财吉拉呼
ZHU Fangyuan;MA Zhiqiang;CHEN Yan;ZHANG Xiaoxu;WANG Hongbin;BAO Caijilahu(College of Data Science and Application,Inner Mongolia University of Technology,Hohhot 010080,China;Inner Mongolia Autonomous Region Engineering&Technology Research Centre of Big Data Based Software Service,Inner Mongolia University of Technology,Hohhot 010080,China)
出处
《计算机科学与探索》
CSCD
北大核心
2021年第12期2241-2255,共15页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金(61762070,61862048)
内蒙古自治区自然科学基金(2019MS06004)
内蒙古自治区科技重大专项(2019ZD015)
内蒙古自治区关键技术攻关计划项目(2019GG273)
内蒙古自治区科技成果转化专项资金(2020CG0073)。