摘要
研究了注意力门控卷积循环神经网络的通用音频标记问题.DCASE 2018挑战任务2的音频样本的数据集过少,容易造成过拟合问题.为了减少过拟合问题,采用数据增强方法,dropout策略.采用可学习的上下文门控模块以帮助选择与音频类最相关的特征.采用时间注意力机制关注音频事件的相关帧并且忽略不相关帧.在DCASE2018任务2的数据集上评估了提出的模型,开发集和测试集的平均准确率(MAP@3得分)分别为96.1%和92.4%,远高于此次竞赛的基线系统的平均准确率.
Aiming at the problem of general-purpose audio tagging,an attention gated convolutional recurrent neural network is proposed.The audio sample number is too small for the Detection and Classification of Acoustic Scenes and Events(DCASE)2018 Challenge Task 2,which causes over-fitting problems.In order to reduce the over-fitting problems,the authors use data augmentation methods,dropout strategies.A learning context gating module is used to help select the most related features of audio events.The temporal attention mechanism is used to focus on the related frames and ignore the unrelated frames of audio events.The authors evaluate the proposed model on the dataset of DCASE 2018 Task 2.The mean average precision(MAP@3)of the development dataset and test dataset are 96.1%and 92.4%respectively,which is much higher than that of baseline system of the competition.
作者
王金甲
崔琳
杨倩
纪绍男
WANG Jinjia;CUI Lin;YANG Qian;JI Shaonan(School of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China;Hebei Key Laboratory of Information Transmission and Signal Processing,Yanshan University,Qinhuangdao,Hebei 066004,China)
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2020年第3期360-367,共8页
Journal of Fudan University:Natural Science
基金
国家自然科学基金(61473339)
河北省青年拔尖人才支持计划([2013]17)
京津冀基础研究合作专项(F2019203583)。