摘要
基于深度神经网络(deep neural network,DNN)的语音活动性检测(voice activity detection,VAD)忽略了声学特征在时间上的相关性,在带噪环境下性能会明显下降。该文提出了一种基于深度神经网络和长短时记忆单元(long-short term memory,LSTM)的混合网络结构应用于VAD问题。进一步对语音帧的动态信息加以分析利用,同时结合DNN-LSTM结构使用一种基于上下文信息的代价函数用于网络训练。实验语料基于TIDIGITS语音库,使用Noisex-92噪声库加噪。实验结果表明:在不同噪声环境下基于DNN-LSTM的VAD方法比基于DNN的VAD方法性能更好,新的代价函数比传统的代价函数更适用于该文提出的算法。
Voice activity detection(VAD)algorithms based on deep neural networks(DNN)ignore the temporal correlation of the acoustic features between speech frames which significantly reduces the performance in noisy environments.This paper presents a hybrid deep neural network with long-short term memory(LSTM)for VAD analyses which utilizes dynamic information from the speech frames.A context information based cost function is used to train the DNN-LSTM network.The noisy speech corpus used here was based on TIDIGITS and Noisex-92.The results show that the DNN-LSTM based VAD algorithm has better recognition accuracy than DNN-based VAD algorithms in noisy environment which shows that this cost function is more suitable than the traditional cost function.
作者
张雪英
牛溥华
高帆
ZHANG Xueying;NIU Puhua;GAO Fan(College of Information Engineering, Taiyuan University of Technology, Taiynan 030024, Chin)
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2018年第5期509-515,共7页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目(61371193)
国家级大学生创新创业训练项目(201610112007)