摘要
深度学习方法在图像识别领域得到大量研究和应用,也逐渐被应用于语种识别。针对深度学习语种识别模型中所用二维特征图语种间相似度大,容易混淆的问题,提出基于反事实注意力学习的ResNeSt语种识别模型。在建立云南边境语种广播语音数据集的基础上,首先,提取MFCC、Fbank和语谱图作为FcaNet、ResNet和ResNeSt三种网络的输入,对比三种网络下不同信噪比不同语音特征的识别效果,得出在语种识别任务中综合表现最佳的网络模型ResNeSt和语音特征Fbank;接着,在识别效果最佳的ResNeSt网络模型中引入反事实注意力学习模块,利用反事实因果关系来衡量ResNeSt网络中注意力特征的质量,促使网络学习更加有效的注意力特征,以此提高网络训练效果。实验结果表明,加入反事实注意力学习后,Fbank特征语种识别率较基线系统提升1.61%,对于MFCC、Fbank和语谱图三种特征,基于反事实注意力学习的ResNeSt网络较基线ResNeSt网络平均提升1.33%。反事实注意力学习帮助注意力机制关注更多重要语种区分性信息,有效提升了网络模型在语种识别任务中的识别效果。
Deep learning methods have received extensive research and application in the field of image recognition,and are gradually being applied in the field of language recognition.Aiming at the problem that the two-dimensional feature map used in the deep learning language recognition model has a large similarity between languages and is easy to be confused,a ResNeSt language recognition model based on counterfactual attention learning is proposed.On the basis of establishing a voice dataset for Yunnan border language broadcasting,MFCC,Fbank,and spectrogram are first extracted as inputs for FcaNet,ResNet,and ResNeSt networks.The recognition effects of different signal-to-noise ratios and speech features under the three networks are compared,and the network model ResNeSt and speech feature Fbank that perform best in language recognition tasks are obtained.Next,a counterfactual attention learning module is introduced into the ResNeSt network model with the best recognition performance,using counterfactual causality to measure the quality of attention features in the ResNeSt network,promoting the network to learn more effective attention features and thereby improving network training effectiveness.The experimental results showed that after adding counterfactual attention learning,the recognition rate of Fbank feature languages increased by 1.61%compared to the baseline system.For MFCC,Fbank,and spectrogram features,the ResNeSt network based on counterfactual attention learning increased by an average of 1.33%compared to the baseline ResNeSt network.Counterfactual attention learning helps attention mechanisms focus on more important language discriminative information,effectively improving the recognition performance of network models in language recognition tasks.
作者
陈思竹
龙华
邵玉斌
CHEN Si-zhu;LONG Hua;SHAO Yu-bin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Radio Monitoring Center of Yunnan Province,Kunming 650228,China)
出处
《中国电子科学研究院学报》
北大核心
2023年第12期1138-1145,共8页
Journal of China Academy of Electronics and Information Technology
基金
云南省媒体融合重点实验室开放基金资助项目(320225403)。