期刊文献+

基于Bi-GRU+BFE模型的短语音说话人识别 被引量:2

Speaker Recognition Under Short Utterance Based on Bi-GRU + BFE Model
下载PDF
导出
摘要 说话人识别作为一种重要的生物特征识别技术,已得到广泛应用。由于实际应用中获取的说话人语音长度有限,如何提高语音特征的短时性能,使声纹识别在短语音上取得较高的准确率,仍然是一大难点。对此,论文研究了一种双向门控循环单元(Bidirectional-Gated Recurrent Unit,Bi-GRU)与块级特征均衡(Block-level Feature Equalization,BFE)结构相结合的说话人识别方法,通过循环神经网络结构将梅尔频率倒谱系数(MFCC)转化为包含短时说话人身份信息的深层时序特征,再结合交叉熵损失函数进行模型训练。实验结果表明,Bi-GRU+BFE模型在短语音说话人识别上取得了比传统的高斯混合模型及其他深度网络模型更高的识别率,训练效率也得到大幅度提高。 As an important biometric recognition technology,speaker recognition has been widely used. Due to the limited length of speaker speech obtained in practical applications,how to improve the short-term performance of speech features and achieve a higher accuracy of voiceprint matching on phrase sounds is still a major research difficulty. In this regard,a bidirectional-gated recurrent unit(Bi-GRU)and block-level feature equalization(BFE)structure-based speaker recognition method is studied in this paper,through the recurrent neural network structure,the Mel frequency cepstrum coefficient(MFCC)is converted into deep time series features containing short-term speaker identity information,and then combines with cross entropy loss function for model training. The experimental results show that the Bi-GRU + BFE model achieves a higher recognition rate than traditional Gaussian mixture models and other deep network models in phrase sound speaker recognition,and the training efficiency has also been greatly improved.
作者 姜珊 张二华 张晗 JIANG Shan;ZHANG Erhua;ZHANG Han(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处 《计算机与数字工程》 2022年第10期2233-2239,共7页 Computer & Digital Engineering
基金 军委装备发展部十三五装备预研领域基金项目(编号:61403120102)资助。
关键词 说话人识别 短语音 双向门控循环单元 块级特征均衡 梅尔频率倒谱系数 speaker recognition voice under short utterance bidirectional-gated recurrent unit block-level feature equalization Mel frequency cepstrum coefficient
  • 相关文献

参考文献5

二级参考文献48

  • 1刘敬伟,徐美芝,郑忠国,程乾生.基于DTW的语音识别和说话人识别的特征选择[J].模式识别与人工智能,2005,18(1):50-54. 被引量:13
  • 2吴尊敬,曹志刚.Improved MFCC-Based Feature for Robust Speaker Identification[J].Tsinghua Science and Technology,2005,10(2):158-161. 被引量:7
  • 3于明,袁玉倩,董浩,王哲.一种基于MFCC和LPCC的文本相关说话人识别方法[J].计算机应用,2006,26(4):883-885. 被引量:14
  • 4Reynolds D A,Rose R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83.
  • 5Reynolds D A.Speaker identification and verification using Gaussian mixture speaker model[J].Speech Communication,1995,17:91-108.
  • 6You K H.Wang H C.Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification[J].Speech Communication,1999,28:227-241.
  • 7Jim Z C.Improvement of the K-means clustering filtering algorithm[J].Pattern Recognition,2008,41 (12):3677-3681.
  • 8Reynolds D A,Thomas F.Speaker verification using adapted Gaus-sian mixture models[J].Digital Signal Processing,2000,10 (1-3):19-41.
  • 9BOVES L WJ. Commercial applications of speaker verification: overview and critical success factors[J]. InternationalJournal of Speech Technology, 1998,3(2): 150-159.
  • 10REYNOLDS D A. An overview of automatic speaker recognition technology[J]. ICASSP, 2002,4(4) :4072 -4075.

共引文献55

同被引文献13

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部