期刊文献+

构音障碍说话人自适应研究进展及展望

Advancements and Prospects in Dysarthria Speaker Adaptation
下载PDF
导出
摘要 自动化语音识别工具让构音障碍者和正常人的沟通变得顺畅,因此,近年来构音障碍语音识别成为了一项热门研究。构音障碍语音识别的研究包括:收集构音障碍者和正常人的发音数据,对构音障碍者和正常人的语音进行声学特征表示,利用机器学习模型比较和识别发音的内容并定位出差异性,以帮助构音障碍者改善发音。然而,由于收集构音障碍者的大量语音数据非常困难,且构音障碍者存在发音的强变异性,导致通用语音识别模型的效果往往不佳。为了解决这一问题,许多研究提出将说话人自适应方法引入构音障碍语音识别。对大量相关文献进行调研发现,当前此类研究主要围绕特征域和模型域对构音障碍语音进行分析。文中重点分析特征变换和辅助特征如何解决语音特征的差异性表示,以及声学模型的线性变换、微调声学模型参数和基于数据选择的域自适应方法如何提高模型识别的准确率。最后总结出构音障碍说话人自适应研究当前遇到的问题,并指出未来的研究可以从语音变异性的分析、多特征多模态数据的融合以及基于小数量的自适应方法的角度,提升构音障碍语音识别模型的有效性。 Automatic speech recognition tools make communication between dysarthria and normal individuals smoother,therefore,dysarthric speech recognition has become a hot research topic in recent years.The research on dysarthric speech recognition includes:collecting pronunciation data from dysarthria and normal individuals,representing acoustic features of dysarthria speech and normal speech,comparing and recognizing the content of pronunciation by machine learning model,and locating differences,so as to help dysarthria to improve their pronunciation.However,due to the significant difficulties in collecting a large amount of speech data from dysarthria,and the strong variability of their pronunciation,the performance of universal speech recognition models is often poor.To address this issue,many studies have proposed to introduce speaker adaptation methods into dysarthric speech recognition.Through extensive research on relevant literature,it has been found that current research mainly focuses on analyzing dysarthria speech in the feature domain and model domain.This paper focuses on analyzing how feature transformation and auxiliary features solve the differential representation of speech features,how linear transformation of acoustic models,fine-tuning of acoustic model parameters,and domain adaptation methods based on data selection improve the accuracy of model recognition.Finally,the current problems encountered in the research of dysarthria speaker adaptation are summarized,and it is pointed out that future research can improve the effectiveness of dysarthric speech recognition models from the perspectives of analyzing speech variability,fusing multi-feature and multi-modal data,and using a small number of speaker adaptation methods.
作者 康新晨 董雪燕 姚登峰 钟经华 KANG Xinchen;DONG Xueyan;YAO Dengfeng;ZHONG Jinghua(Beijing Key Laboratory of Information Service Engineering,Beijing Union University,Beijing 100101,China;Lab of Computational Linguistics,School of Humanities,Tsinghua University,Beijing 100084,China;Center for Psychology and Cognitive Science,Tsinghua University,Beijing 100084,China)
出处 《计算机科学》 CSCD 北大核心 2024年第8期11-19,共9页 Computer Science
基金 北京市自然科学基金(4202028) 国家语言文字工作委员会项目(YB145-25) 国家自然科学基金(62036001) 国家社会科学基金(21BYY106,21&ZD292) 2019年度北京市教育委员会科技一般项目(KM201911417005)。
关键词 构音障碍 说话人自适应 辅助特征 变换 微调 域自适应 Dysarthria Speaker adaptation Auxiliary features Transformation Fine-tuning Domain adaptation
  • 相关文献

参考文献4

二级参考文献9

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部