摘要
对基于相关向量机和高斯混合模型的说话人识别算法的模型和特征空间进行了一系列的研究。与一些基于语音帧的说话人识别算法相比,该算法将GMM算法作为底层的语音特征提取,从而实现对语音整体上的处理,对常用的两种语音特征美尔频率倒频系数和瞬时频率的表现进行了对比研究;同时,该算法充分利用了相关向量机的所提供的高泛化性、核函数功能和结果的高稀疏性。基于Chains和AHUMADA两个专门用于说话人识别的语音库的仿真表明,该算法在减少相对误差和减少计算量方面有较大的优势。
A series of studies on speaker recognition algorithm based on relevance vector machine (RVM) and gaussian mixture model (GMM) was proposed in this paper. The sparseness and probability prediction of RVM make the algorithm suitable for speaker recognition in applications. The robust speech features based on GMM are investigated. In contrast to the most current systems based on frame-level discrimination, the approach has two outstanding merits. The first is the system provides direct discrimination between whole sequences by combining GMM as underlying generative models in feature-space. The paper focused on two main feature space: mel-frequency cepstrum coefficient (MFCC) and instantaneous frequencies (IF). The second combines the high generalization, kernel tricks, and sparser performance of RVM to generate more robust classification results and to reduce the computational complexity. The simulations using the Chains database and the AHUMADA database show that the proposed algorithm outperforms the other systems on reducing the relative error rates and reducing the computational complexity in high dimensionality space and big scale data.
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2010年第2期311-315,共5页
Journal of University of Electronic Science and Technology of China
基金
国家863计划(2007AA01Z321)
四川省教育厅自然科学重点项目(08ZA037)
关键词
高斯分布
GMM超向量核
瞬时频率
相关向量机
语音分析
gaussian distribution
GMM super-vector kernel, instantaneous frequencies
relevance vector machine
speech analysis