摘要
说话人识别作为一种重要的生物特征识别技术,已得到广泛应用。由于实际应用中获取的说话人语音长度有限,如何提高语音特征的短时性能,使声纹识别在短语音上取得较高的准确率,仍然是一大难点。对此,论文研究了一种双向门控循环单元(Bidirectional-Gated Recurrent Unit,Bi-GRU)与块级特征均衡(Block-level Feature Equalization,BFE)结构相结合的说话人识别方法,通过循环神经网络结构将梅尔频率倒谱系数(MFCC)转化为包含短时说话人身份信息的深层时序特征,再结合交叉熵损失函数进行模型训练。实验结果表明,Bi-GRU+BFE模型在短语音说话人识别上取得了比传统的高斯混合模型及其他深度网络模型更高的识别率,训练效率也得到大幅度提高。
As an important biometric recognition technology,speaker recognition has been widely used. Due to the limited length of speaker speech obtained in practical applications,how to improve the short-term performance of speech features and achieve a higher accuracy of voiceprint matching on phrase sounds is still a major research difficulty. In this regard,a bidirectional-gated recurrent unit(Bi-GRU)and block-level feature equalization(BFE)structure-based speaker recognition method is studied in this paper,through the recurrent neural network structure,the Mel frequency cepstrum coefficient(MFCC)is converted into deep time series features containing short-term speaker identity information,and then combines with cross entropy loss function for model training. The experimental results show that the Bi-GRU + BFE model achieves a higher recognition rate than traditional Gaussian mixture models and other deep network models in phrase sound speaker recognition,and the training efficiency has also been greatly improved.
作者
姜珊
张二华
张晗
JIANG Shan;ZHANG Erhua;ZHANG Han(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处
《计算机与数字工程》
2022年第10期2233-2239,共7页
Computer & Digital Engineering
基金
军委装备发展部十三五装备预研领域基金项目(编号:61403120102)资助。
关键词
说话人识别
短语音
双向门控循环单元
块级特征均衡
梅尔频率倒谱系数
speaker recognition
voice under short utterance
bidirectional-gated recurrent unit
block-level feature equalization
Mel frequency cepstrum coefficient