期刊文献+

基于概率球面判别分析的说话人识别信道补偿算法

Channel compensation algorithm for speaker recognition based on probabilistic spherical discriminant analysis
下载PDF
导出
摘要 在说话人识别任务中,概率线性判别分析(PLDA)模型是目前常用的分类后端,但由于高斯PLDA模型分布假设不能准确拟合真实说话人特征分布,导致基于高斯分布假设长度归一化的信道补偿方法会破坏说话人特征类内分布的独立性,使得高斯PLDA不能充分利用上游任务提取特征所包含的说话人信息,从而影响识别结果。针对这一问题,提出基于概率球面判别分析的信道补偿算法(CC-PSDA),通过引入冯·米塞斯-费希尔(VMF)分布假设的概率球面判别分析模型(PSDA)和特征变换方法代替高斯分布假设的概率线性判别分析方法,以避免信道补偿对说话人特征类内分布独立性的影响。首先,为了使说话人特征符合VMF分布先验假设拟合后端分类模型,在特征级利用非线性转换对说话人特征进行分布变换。之后,利用基于VMF分布假设的PLDA模型不会破坏说话人特征的类内分布结构的特点,将变换后的说话人特征定义到特定维度的超球面,最大化特征类间距离。所提算法通过期望最大化(EM)算法进行求解,最终完成分类任务。实验结果表明,改进算法在三个测试集上的识别等错误率相较于对比模型PSDA、高斯PLDA均最低。由此可见,所提模型可以有效区分说话人特征,提高识别性能。 In speaker recognition tasks,the Probabilistic Linear Discriminant Analysis(PLDA)model is a commonly used classification backend.However,due to the inaccurate fitting of the real speaker feature distribution by the distribution assumption of Gaussian PLDA model,length normalization-based channel compensation methods based on the Gaussian distribution assumption may destroy the independence of the within-class distribution of speaker features,making the Gaussian PLDA unable to fully utilize the speaker information contained in the upstream task feature extraction,thereby affecting the recognition results.To address this issue,a Channel Compensation algorithm for speaker recognition based on Probabilistic Spherical Discriminant Analysis(CC-PSDA)was proposed,which introduced a Probabilistic Spherical Discriminant Analysis(PSDA)model with Von Mises-Fisher(VMF)distribution assumption and a feature transformation method to replace the PLDA method based on the Gaussian distribution assumption,for avoiding the impact of channel compensation on the independence of the within-class distribution of speaker features.Firstly,in order to make the speaker features conform to the VMF distribution prior assumption and fit the backend classification model,a nonlinear transformation was used to transform the distribution of the speaker features at the feature level.Then,by utilizing the characteristic of the PSDA model based on the VMF distribution assumption that does not destroy the within-class distribution structure of speaker features,the transformed speaker features were defined on a hypersphere of a specific dimension,maximizing the inter-class distance of features.The proposed model was solved by the EM(Expectation Maximum)algorithm,and the classification task was ultimately completed.Experimental results show that the improved algorithm has the lowest recognition equal error rates compared to the PSDA and Gaussian PLDA models on three test sets.Therefore,the proposed algorithm can effectively distinguish speaker features and improve recognition performance.
作者 景维鹏 肖庆欣 罗辉 JING Weipeng;XIAO Qingxin;LUO Hui(School of Information and Computer Engineering,Northeast Forestry University,Harbin Heilongjiang 150006,China)
出处 《计算机应用》 CSCD 北大核心 2024年第2期556-562,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(62101114)。
关键词 说话人识别 i-vector 概率球面判别分析 信道补偿 冯·米塞斯-费希尔分布 长度归一化 speaker recognition i-vector Probabilistic Spherical Discriminant Analysis(PSDA) channel compensation Von Mises-Fisher(VMF)distribution length normalization
  • 相关文献

参考文献5

二级参考文献53

  • 1BOVES L WJ. Commercial applications of speaker verification: overview and critical success factors[J]. InternationalJournal of Speech Technology, 1998,3(2): 150-159.
  • 2REYNOLDS D A. An overview of automatic speaker recognition technology[J]. ICASSP, 2002,4(4) :4072 -4075.
  • 3KERST A L G. Voiceprint identification[J]. TheJournal of the Acoustical Society of America, 1962, 34 (5) : 725 - 725.
  • 4MAKHOULJ. Linear prediction: a tutorial review[J]. Proceeding of the IEEE, 1975, 63 (4) :561 - 580.
  • 5SAKOE H, CHIBA S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26 (2) :43 - 49.
  • 6WAIBEL A. Modular construction of time - delay neural networks for speech recognition[J]. Neural Computation, 1989, 1 (1 ) : 39 - 46.
  • 7SOONG F, ROSENBERG A, RABINER 1. A vector quantization approach to speaker recognition[C] / / International Conference on Acoustics, Speech, and Signal Processing, Tampa: IEEE, 1985 :387 -390.
  • 8RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989, 77(2) :257 -286.
  • 9ROSE R, REYNOLDS R A. Text independent speaker identification using automatic acoustic segmentation[C] / / International Conference on Acoustics, Speech, and Signal Processing 1990, Albuquerque: IEEE, 1990: 293 -296.
  • 10REYNOLDS D A, QUA TIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models[J]. Digital signal processing, 2000,10(1-3) :19 -4l.

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部