摘要
非法认证者可通过播放重新录制合法认证者的语音欺骗说话人识别系统以获得进入系统的权限,为社会安全带来威胁。因此,重录语音的检测具有现实的紧迫性,但相关的研究报道仍较缺乏。为此,本文提出一种重录语音的检测算法。该算法以MFCC(Mel-Frequency Cepstral Coefficients,美尔频率倒谱系数)的统计量作为SVM(Support Vector Machine,支持向量机)和KNN(K-Nearest Neighbors,K最近邻)分类方法的特征;除以上两种分类方法外,本文亦考察使用SAE(Sparse Autoencoder,稀疏自动编码器)的检测性能。为模拟现实生活中重录语音的真实情景,本文实验通过不同的录音设备、录音距离及录音环境对算法进行全面的测试。实验结果表明,通过增加重录语音的多样性作为训练可以使该算法的正确率提高到99.67%,达到了较好的检测性能。
Recaptured speech can be used to deceive authentication systems for illegal purposes in speech/audio community,and thus it presents threats to security.Therefore,it is of great significance to investigate detection of recaptured speech.However,the related research efforts are still insufficient.In this paper,we propose an algorithm to detect recaptured speech.The statistics of MFCC(Mel-Frequency Cepstral Coefficients)are employed as the features for SVM(Support Vec-tor Machine)and KNN(K-Nearest Neighbors)classification.Besides,SAE(Sparse Autoencoder)is also used for perform-ance assessment.To simulate the real scenarioes of speech recapture process,varieties of recording devices,distances and environments are taken into consideration in the experiments.Experimental results show that accuracy of99.67%can be a-chieved by increasing the diversity of recaptured speech,indicating a good detection performance of the proposed algorithm.
作者
李山路
王泳
甘俊英
LI Shan-lu;WANG Yong;GAN Jun-ying(School of Information Engineering, Wuyi University, Jiangmen, Guangdong 529020, China;Corresponding Author, School of Electronic and Information, Guangdong Polytechnic Normal University, Guangzhou, Guangdong 510665, China)
出处
《信号处理》
CSCD
北大核心
2017年第1期95-101,共7页
Journal of Signal Processing
基金
国家自然科学基金(61672173
61372193
61072127)
国家自然科学基金(青年科学基金)(61100168)
广东省自然科学基金(S2013010013311
2014A030313623)
广东省普通高校特色创新项目(2015KTSCX083)
关键词
重录语音检测
社会安全
美尔频率倒谱系数
支持向量机
K最近邻
稀疏自动编码器
speech recapture detection
social security
Mel-frequency cepstral coefficients
support vector machine
K-nearest neighbors
sparse autoencoder