摘要
为提高歌声识别准确率,提出一种基于Transformer并带有纠正模型的歌声识别方法TSC(transformer with spelling correction)。利用注意力机制,使网络学习对应的歌词发音。在模型输入模块,增加由卷积神经网络组成的特征提取层,提取歌声特征。在输出模块后面,增加由卷积神经网络和双向循环神经网络组成的纠正模型,修正模型的输出结果。针对歌声样本量较少,模型训练困难的问题,提出了使用汉语语音数据集AISHELL-1进行预训练,并自制一组数据进行数据增强,对歌声识别模型参数进行微调。在增强的Opencpop歌声数据集上进行实验的结果表明,提出的歌声识别系统的字错率降低到了31.92%。
To improve the accuracy of sung speech recognition,a sung speech recognition method TSC(Transformer with spelling correction)based on Transformer and with correction model was proposed.The attention mechanism was used to make the network learn the corresponding pronunciation of lyrics.In the input module of the model,a feature extraction layer consisting of a convolutional neural network was added to extract the singing features.After the output module,a correction model consisting of a convolutional neural network and a bi-directional recurrent neural network was added to correct the output of the model.Furthermore,in order to solve the problem of small sample size and difficulty in model training,the Chinese speech dataset AISHELL-1 was proposed to use for pre-training,and a home-made set of data was built for data enhancement to fine-tune the parameters of the sung speech recognition model.The results of experiments conducted on the enhanced Opencpop song dataset show that the character error rate of the proposed sung speech recognition system reaches 31.92%,and the recognition accuracy is improved by about 23%compared with the current baseline method.
作者
吴影
徐雅斌
孟晶晶
WU Ying;XU Yabin;MENG Jingjing(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China;Computer School,Beijing Information Science&Technology University,Beijing 100101,China;Big Data Security Technology Research Institute,Beijing Information Science&Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2023年第3期35-42,51,共9页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金资助项目(61672101)
网络文化与数字传播北京市重点实验室开放课题(ICCD XN004)
信息网络安全公安部重点实验室开放课题(C18601)。