摘要
音调篡改技术是语音伪造常用的一种技术手段,可能对说话人验证系统造成威胁。对噪声和压缩场景下的伪造语音检测问题进行了研究,提出了基于改进胶囊网络的音调篡改检测算法。为增强鲁棒性,将相对频谱感知线性预测(RelAtive SpecTrAl-Perceptual Linear Predictive,RASTAPLP)和梅尔倒谱系数(Mel-scale Frequency Cepstral Coefficients,MFCC)融合为新特征,并输入优化的胶囊网络,对经加噪和压缩处理的音频进行检测。实验结果表明,该算法在已知噪声、未知噪声和压缩场景下的检测准确率均在98%以上,和现有的一些算法相比,具有较高的检测准确率和鲁棒性。
Pitch tampering technique is a common technical means of speech forgery, which can pose great threat to speaker verification systems. For the problems of detecting forged speech in noisy and compressed scenarios, a pitch tampered detection algorithm based on improved capsule network is proposed. To enhance the robustness, RASTA-PLP(RelAtive SpecTrAl-Perceptual Linear Predictive) and MFCC(Mel-scale Frequency Cepstral Coefficients) are fused into new features and fed into an optimized capsule network for detection on the noise-added and compressed processed audio. Experimental results indicate that the accuracy of the algorithm reached over 98% in known noise, unknown noise and compression scenarios.Therefore, compared with some existing algorithms, the algorithm proposed in this paper has higher detection accuracy and robustness.
作者
杜海云
王宏霞
DU Haiyun;WANG Hongxia(Sichuan University,Chengdu Sichuan 610207,China)
出处
《通信技术》
2022年第8期984-989,共6页
Communications Technology
基金
四川省科技计划资助项目(2022YFG0320)。