应用ResNet和CatBoost检测重放语声

Detection of replay voice by ResNet and CatBoost

下载PDF

导出

摘要针对短语声指令声频信息少、不适用句子级重放语声检测的问题以及近距离录声后用高质量重放设备重放的语声难以检测的问题,提出了一种适用于词级重放语声检测的模型。首先,利用短时傅里叶变换、低频平均能量计算和帧排序等方法选择声频帧,然后提取这些帧的伽马通频率倒谱系数。其次,用基于自注意机制的残差网络模型进一步提取伽马通频率倒谱系数中的信息,并转化为特征向量。最后,将提取后的特征向量用CatBoost分类,从而提高检测性能。在POCO数据集上的实验结果表明,提出的方法可以以87.54%的准确率和12.53%的等错误率检测重放语声,优于基线和现有的方法。该文提出的方法在ASVspoof2019 PA数据集上的等错误率与串联检测代价函数分别为4.92%和0.1418,证明该文方法也适用于多种设置的重放语声检测。 To deal with the problem that short voice commands have little audio information and are not suitable for sentence-level replay voice detection as well as the problem that voice replayed with high quality device after short distance recording is difficult to detect,a model for word-level replay voice detection is proposed.Firstly,short time Fourier transform,low frequency average energy computation and frame sorting are used to select audio frames reasonably,followed by the acoustic feature extraction of these frames based on Gammatone frequency cepstral coefficient(GFCC).Then,the information in the GFCC is further extracted with a selfattentional residual network(ResNet)model and converted into feature vectors.Finally,the extracted feature vectors are classified by CatBoost to improve detection performance.The experimental results on the POCO dataset show that our proposal can achieve replay voice detection with the accuracy of 87.54%and the equal error rate of 12.53%,outperforming the baseline and existing methods.The equal error rate and concatenation detection cost function of the method proposed in this paper on the ASVspoof2019 PA dataset are 4.92%and 0.1418 respectively,which demonstrates that our proposal is also suitable for replay voice detection in various settings.

作者孙晓川付景昌宋晓婷宗利芳李志刚 SUN Xiaochuan;FU Jingchang;SONG Xiaoting;ZONG Lifang;LI Zhigang(College of Artificial Intelligence,North China University of Science and Technology,Tangshan 063210,China;Hebei Key Laboratory of Industrial Intelligent Perception,Tangshan 063210,China)

机构地区华北理工大学人工智能学院河北省工业智能感知重点实验室

出处《应用声学》 CSCD 北大核心 2023年第4期861-870,共10页 Journal of Applied Acoustics

基金河北省高等学校科学技术研究项目(ZD2021088) 国家重点研发计划项目(2017YFE0135700)。

关键词重放语声检测气爆杂声残差网络 CatBoost Replay voice detection Pop noise ResNet CatBoost

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

1万伊,杨飞然,杨军.基于Transformer编码器的合成语声检测系统[J].应用声学,2023,42(1):26-33.
2陈毓.住在洋西的爷爷[J].小小说选刊,2022(8):83-85.
3孙一涵,徐来喜(指导).梦想开始起航[J].初中生之友,2023(8):46-47.
4孔秋莉.草为什么没有眼睛[J].博爱,2021(5):31-31.
5洪莉.黑板报里的小梦想[J].中学生博览,2023(16):22-22.
6张玉曌.4种检测方法辅助肺结核诊断的价值研究[J].中文科技期刊数据库（全文版）医药卫生,2020(10):230-231.
7动感地带[J].故事会,2022(9):81-81.
8陈毓.住在沣西的爷爷[J].微型小说选刊,2022(3):34-36.
9吴新桂.易混名词辨析[J].初中生学习指导,2023(14):40-40.
10梁文道.留白[J].风流一代,2022(35):51-51.

应用声学

2023年第4期

浏览历史

内容加载中请稍等...

应用ResNet和CatBoost检测重放语声

相关作者

相关机构

相关主题

浏览历史