期刊文献+

基于改进Inception网络的语音分类模型 被引量:1

Speech classification model based on improved Inception network
下载PDF
导出
摘要 针对传统音频分类模型提取音频特征的过程繁琐,以及现有神经网络模型存在过拟合、分类精度不高、梯度消失等问题,提出一种基于改进Inception网络的语音分类模型。首先,在模型中加入ResNet中的残差跳连思想以改进传统的InceptionV2模型,使网络模型在加深的同时避免梯度消失;其次,优化Inception模块中的卷积核大小,并利用不同尺寸卷积对原始语音的Log-Mel谱图进行深度特征提取,使模型通过自主学习的方式选择合适的卷积处理数据;同时,在深度与宽度两个维度改进模型以提高分类精度;最后,利用训练好的网络模型对语音数据进行分类预测,并通过Softmax函数得到分类结果。在清华大学汉语语音数据集THCHS-30与环境声音数据集UrbanSound8K数据集上的实验结果表明,改进的Inception网络模型在上述两个数据集上分类准确率分别为92.76%与93.34%。相较于VGG16、InceptionV2、GoogLeNet等模型,所提模型的分类准确率取得了最优,最多提高了27.30个百分点。所提模型具有更强的特征融合能力和更准确的分类结果,能够解决过拟合、梯度消失等问题。 Aiming at the complicated process of extracting audio features by traditional audio classification models,and problems of the existing neural network models such as overfitting,low classification accuracy,and vanishing gradient,a speech classification model based on improved Inception network was proposed.Firstly,in order to avoid the vanishing gradient while increasing the depth of the network,the residual skip connection idea in Residual Network(ResNet)was added into the model to improve the traditional Inception V2 model.Secondly,the size of the convolution kernel in the Inception module was optimized,and the deep features of Log-Mel spectrogram of the original speech were extracted by using different sizes of convolutions,so that the model was able to select the appropriate convolution to process the data through self-learning.At the same time,the model was improved in depth and width dimensions in order to increase the classification accuracy.Finally,the trained network model was used to classify and predict the speech data,and the classification result was obtained through the Softmax function.Experimental results on Tsinghua University Chinese speech database THCHS-30 and ambient sound dataset UrbanSound8K show that the classification accuracy of the improved Inception network model on the above two datasets is 92.76%and 93.34%respectively.Compared with models such as Visual Geometry Group(VGG16),InceptionV2 and GoogLeNe,the classification accuracy of the proposed model is the best,with a maximum increase of 27.30 percentage points.It can be seen that the proposed model has stronger feature fusion ability and more accurate classification results,can solve problems such as overfitting and vanishing gradient.
作者 张秋余 王煜坤 ZHANG Qiuyu;WANG Yukun(School of Computer and Communication,Lanzhou University of Technology,Lanzhou Gansu 730050,China)
出处 《计算机应用》 CSCD 北大核心 2023年第3期909-915,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(61862041)。
关键词 语音分类 卷积神经网络 残差跳连 对数梅尔谱图 深度特征 speech classification convolutional neural network residual skip connection Log-Mel spectrogram depth feature
  • 相关文献

参考文献4

二级参考文献17

共引文献17

同被引文献26

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部