基于卷积循环神经网络的语音逻辑攻击检测

Speech Logic Attack Detection Based on CNN-RNN-DNN Network

下载PDF

导出

摘要语音合成和语音转换等技术正逐渐成为合成语音的主流方法,合成语音对社会稳定和国家安全都具有潜在的风险。为进一步提高合成、转换伪造语音检测的准确率,从混合网络模型,特征选择出发,提出了基于CNN-RNN-DNN网络的3种混合网络模型,分别为CNN-LSTM-DNN、CNN-GRU-DNN、CNN-BiLSTM-DNN。模型中卷积神经网络(convolutional neural network,CNN)部分可以进行下采样,循环神经网络(recurrent neural network,RNN)部分解决语音中的时序问题,深度神经网络(deep neural network,DNN)部分则实现分类功能。每种混合网络模型包含20层网络层。对提取的6种声学特征进行实验,其中CNN-LSTM-DNN+MFCC的组合表现最优,等错误率为5.79%,比ASVspoof2019提供的B02基线系统低28.43%。比较了3种混合网络结合6种特征的表现并增加了其与4种单独网络的对照实验,结果表明本文提出的混合网络模型具有性能稳定、准确率高等优点且梅尔频率倒谱系数(mel-frequency cepstral coefficients,MFCCs)特征及混合梅尔倒谱系数线性频率倒谱系数(linear frequency cepstral coefficient,LFCC)特征更适合此模型。 Speech synthesis and speech conversion and other technologies are gradually becoming the mainstream methods for synthesizing speech,which has potential risks to social stability and national security.To further improve the accuracy of synthesized and converted forged speech detection,three hybrid network models were proposed from the hybrid network model,feature selection,which were based on CNN-RNN-DNN networks,namely CNN-LSTM-DNN,CNN-GRU-DNN and CNN-BiLSTM-DNN.Subsampling can be carried out by the CNN part of the model.The timing problem of speech can be solved by the RNN part,and the classification function can be realized by the DNN part.20 network layers are contained in each fusion network model.The extracted 6 acoustic features were tested,among which the combination of CNN-LSTM-DNN+MFCC performed the best,with an equal error rate of 5.79%,which was 28.43%lower than the B02 baseline system provided by ASVSPoof2019.At the same time,the performance of three fusion networks combined with six characteristics was compared.The results show that the hybrid network model proposed has the advantages of stable performance and high accuracy,besides the MFCC feature and MFCC+LFCC fusion feature is better fit with this fusion network.

作者杨海涛王华朋楚宪腾牛瑾琳林暖辉张琨瑶 YANG Hai-tao;WANG Hua-peng;CHU Xian-teng;NIU Jin-lin;LIN Nuan-hui;ZHANG Kun-yao(Video and Audio Material Examination Department, Criminal Investigation Police University of China, Shenyang 110854, China;Criminal Science and Technology Institute of Guangzhou, Guangzhou 510030, China)

机构地区中国刑事警察学院公安信息技术与情报学院广州市刑事科学技术研究所

出处《科学技术与工程》北大核心 2022年第18期7937-7944,共8页 Science Technology and Engineering

基金国家重点研发计划(2017YFC0821000) 广州市科技计划(2019030004) 司法部司法鉴定重点实验室(司法鉴定科学研究院)开放基金。

关键词 CNN-RNN-DNN 混合网络模型混合声学特征等错误率 ASVspoof2019 CNN-RNN-DNN fusion model fusion feature EER ASVspoof2019

分类号 TN912.3 [电子电信—通信与信息系统] TP391.4 [自动化与计算机技术—计算机应用技术]