摘要
传统的法庭说话人识别方法存在对语音数据建模能力差、特征提取难以及容易受噪声干扰影响等问题,为了改进这些问题,提出一种基于卷积神经网络的法庭说话人识别方法。该方法以AlexNet网络为基础进行参数调整,为了弥补ReLU函数作为激活函数时易出现神经元坏死和偏移的现象,融合Tanh和ReLU函数的特性,构造一种新的TR函数作为网络的激活函数。同时,为了避免人工提取语音特征的主观性和不全面性,在实验中将语音转换成声纹图作为网络输入。实验结果表明,激活函数为TR函数时,该方法在法庭说话人识别数据集的准确率达到了92.24%,在花朵图像公开数库的准确率达到了96.13%,效果均好于Tanh和ReLU函数。
In the traditional court speaker recognition method,there are some problems such as poor modeling ability of speech data,difficulty in feature extraction and vulnerability to noise interference.In order to improve these problems,a court speaker recognition method based on convolution neural network is proposed.This method is based on AlexNet network to adjust the parameters,in order to make up for the phenomenon of neuronal necrosis and migration when ReLU function is used as activation function,the characteristics of Tanh and ReLU function are fused.A new TR function is constructed as the activation function of the network.At the same time,in order to avoid the subjectivity and incompleteness of manual extraction of speech features,in this paper,the speech is converted into sound pattern as the input of the network.The experimental results show that when the activation function is a TR function,the accuracy on the court speaker recognition data set is 92.24%,and the accuracy on the flower image open database is 96.13%,which is better than the Tanh and ReLU function.
作者
南兆营
NAN Zhaoying(Criminal Investigation Police University of China,Shenyang 110854,China)
出处
《电声技术》
2021年第2期23-27,31,共6页
Audio Engineering
关键词
卷积神经网络
法庭说话人识别
激活函数
声纹图
convolutional neural network
forensic speaker recognition
activation function
spectrogram