基于改进时延神经网络的合成语音检测

Synthetic Speech Detection Based on Improved Time Delay Neural Network

下载PDF

导出

摘要在可变内核机制的时延神经网络基础上,提出一种带有全局多尺度注意力机制的神经网络结构和基于Fbank和翻转梅尔频率倒谱系数(Inversed Mel-Frequency Cepstral Coefficients,IMFCC)的融合特征。在ASVspoof 2019 LA数据集上,采用等错误率和测试集准确率作为评价指标。实验结果表明,使用提出的含全局多尺度注意力机制的神经网络结构,在相同声学特征的情况下,识别准确率比ECAPA-TDNN和SKA-TDNN分别提高5.1%和4.3%。 In this paper,a neural network architecture with global multi-scale attention mechanism and a fusion feature based on Fbank and Inversed Mel-Frequency Cepstral Coefficients(IMFCC)are proposed on the basis of variable kernel mechanism time delay neural network.The equal error rate and the accuracy of test set were used as the evaluation index on the ASVspoof 2019 LA data set.The experimental results show that the proposed neural network structure with global multi-scale attention mechanism can improve the recognition accuracy by 5.1%and 4.3%compared with ECAPA-TDNN and SKA-TDNN,respectively,under the same acoustic features.

作者王志翼张红兵 WANG Zhiyi;ZHANG Hongbing(School of Public Security Information Technology,Criminal Investigation Police University of China,Shenyang 110035,China)

机构地区中国刑事警察学院公安信息技术与情报学院

出处《电声技术》 2023年第9期118-120,共3页 Audio Engineering

基金 2023年中央高校基本科研业务费重大项目培育计划(JYTZD2023150)。

关键词时延神经网络合成语音特征融合 time-delay neural network synthetic speech feature fusion

分类号 TP183 [自动化与计算机技术—控制理论与控制工程] TN912.33 [电子电信—通信与信息系统]

电声技术

2023年第9期

浏览历史

内容加载中请稍等...

基于改进时延神经网络的合成语音检测

相关作者

相关机构

相关主题

浏览历史