摘要
为改进在真实对话中分割重叠语音的自然事件,训练一个深度卷积神经网络(DCNN),使用来自单声道音频的级别相对较低的对数标度梅尔频谱图进行端到端的学习.使用Fisher英语语料库的真实会话数据正确训练DCNN,同时保持并测试其对普通会话场景的普遍性.为了缓解严重的类失衡,在训练集中采取消除静音,并在训练过程中对占比重较多的类进行统一随机抽样.同时,使用维特比算法执行时间平滑以增强最终分割.在超过91 h的对话中,检测精度超过60%,召回率超过29%,证明了深度学习对于这项任务的适用性.
To improve the segmenting natural occurrences of overlapping speech in real conversations,a ceep convolutional neural network(DCNN)was trained.It used relatively low-level log-scaled Mel-spectrograms from mono-aural audio to end-to-end learning.The DCNN was properly trained by using the real conversational data from the Fisher English Corpus while maintaining and testing its generalizability to real conversational scenarios.To alleviate the imposed challenge of severe class-imbalance,the silence was removed from the training objective and the majority class was randomly sampled during training.Simultaneously,using the Viterbi algorithm to perform temporal smoothing which enhanced the final segmentation.Over 60%precision and over 29%recall rate in over 91 h of conversations demonstrate the applicability of deep learning to this task.
作者
魏金太
高穹
WEI Jin-tai;GAO Qiong(Department of Information and Art Design, Henan Forestry Vocational College, Luoyang 471002, China;Luoyang Electronic Equipment Testing Center, Luoyang 471003, China)
出处
《中北大学学报(自然科学版)》
CAS
2021年第1期34-39,共6页
Journal of North University of China(Natural Science Edition)
基金
国家自然科学基金资助项目(11404398)
河南科技厅重点攻关项目(142102210097)。
关键词
重叠语音
深度卷积神经网络
对话分析
语音分割
类失衡
overlapping speech
deep convolutional neural network
conversation analysis
speech segmentation
class-imbalance