摘要
针对许多基于训练模型的计算机听觉场景分析系统,在解决双说话人混合语音信号分离时需要依赖样本训练的有效性以及说话人的先验知识,提出一种基于聚类的单声道混合语音分离系统。系统先利用多基音跟踪算法对语音信号进行分析并产生同时流,然后通过最大化类内散布矩阵与类间散布矩阵的迹,搜索同时流的最佳分类,最终完成对双说话人的语音分离。该系统不需要训练语音模型,并且有效地改善了在双说话人混合语音信号的分离效果,为双说话人的语音分离提供了新的思路。
This paper proposes an unsupervised clustering approach for coehannel speech separation to solve the problem that many auditory scene analysis (CASA) systems using training model to require the availability of pretrained speaker models and prior knowledge of participating speakers. The system produces simultaneous streams of mixture signal through multi-pitch track- ing algorithm, and searches for the optimal assignment of simultaneous speech streams by maximizing the between- and within- cluster scatter matrix ratio to separate the mixtures. The system does not require trained speaker models, improves obviously the performance of eoehannel separation, which offers a good solution to separate coehannel speech.
出处
《计算机与现代化》
2014年第4期86-88,共3页
Computer and Modernization