摘要
说话人分割聚类是语音处理领域一个重要的研究课题。为提高说话人分割聚类的准确性,提出一种基于DS证据理论多特征融合模型用于提取说话人嵌入特征。该方法使用2种组合特征来更高效地表征语音,用于DenseNet网络的输入,利用DS证据理论对softmax层的输出进行融合,得到说话人的嵌入特征。分别使用单一特征与组合特征输入的DenseNet网络与该模型进行实验对比分析,结果表明,基于该模型的说话人分割聚类提取目标说话人的准确性更有优势。
Speaker diarization is an important research topic in the field of speech processing.In order to improve the accuracy of speaker diarization,a Dempster-Shafer theory based multi-feature fusion model is proposed for extracting speaker embedding features.Through this method,two combined features are used to represent the speech more efficiently,which is used for the input of the DenseNet network,and the DS evidence theory is used to fuse the output of the softmax layer to get the embedded features of the speaker.The DenseNet network with single feature input and combined feature input is used to compare with the model,and the results show that the accuracy of speaker diarization and clustering based on this model is better.
出处
《科技创新与应用》
2023年第23期108-111,共4页
Technology Innovation and Application