摘要
针对目前语音情感识别模型参数多、运算量大、训练速度慢等问题,提出了一种适用于小数据集、轻量型的网络模型。模型以胶囊网络为基础结构,引入深度可分离卷积模块代替胶囊网络中原有的卷积层以减少计算量。基于迁移学习提取普适的底层图像特征,利用语谱图来微调整个网络,减弱模型在小数据集上的过拟合现象。再利用夹角余弦来计算动态路由结构中向量的相似度,提高动态路由算法性能。实验结果表明,轻量型胶囊网络的识别率和运算速度均优于对比的7种深度学习网络模型。
Aiming at the problems of many parameters,large amount of computation and slow training speed of the current speech emotion recognition model,this paper proposes a lightweight network model suitable for small data sets.The model is based on the capsule network,and the deep separable convolution module is introduced to replace the original convolution layer in the capsule network to reduce the amount of calculation.Transfer learning is used to extract the universal underlying image features,and then spectrogram is used to finely tune the over fitting phenomenon of the whole network weakening model on small data sets.The angle cosine is used to calculate the vector similarity in the dynamic routing structure so as to improve the performance of the dynamic routing algorithm.The experimental results show that the recognition rate and operation speed of the lightweight capsule network are better than the seven deep learning network models.
作者
王颖
高胜
WANG Ying;GAO Sheng(School of Computer and Information Technology,Northeast Petroleum University,Daqing Heilongjiang 163318;School of Mechanical Science and Engineering,Northeast Petroleum University,Daqing Heilongjiang 163318)
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2023年第3期423-429,共7页
Journal of University of Electronic Science and Technology of China
基金
国家自然科学基金(61702093)
国家重点研发计划(2018YFE0196000)
黑龙江省自然科学基金(F2018003)
黑龙江省博士后专项(LBH-Q20077)。
关键词
胶囊网络
深度可分离卷积
语音情感识别
迁移学习
capsule network
depth separable convolution
speech emotion recognition
transfer learning