摘要
该文基于语音信号的超矢量特征空间,提出了一种基于Fisher准则的可辨别性深度信念网络(discriminativedeep belief network,DDBN)训练方法,得到了优于传统深度信念网络(deep belief network,DBN)的说话人码本矢量特征,并利用这些码本特征对多说话人的音段进行了聚类与分割。由TIMIT数据库生成的多说话人语音分割的实验结果表明,该基于Fisher准则函数的DDBN说话人分割算法的性能明显好于传统的Bayes信息判决(Bayesian informa-tion criterion,BIC)法和DBN法。
A discriminative deep belief network(DDBN) based on the Fisher criterion is used here to calculate the super-vector feature space of speech signals.The network extracts the feature codebook of the speaker that is superior to the one from the traditional deep belief network(DBN) algorithm for multi-speaker clustering and segmentation.Evaluations on the multi-speaker audio stream corpus generated from the TIMIT database show that the speaker segmentation algorithm based on the DDBN with the Fisher criterion performs better than the traditional Bayesian information criterion(BIC) method and the DBN method.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第6期804-807,812,共5页
Journal of Tsinghua University(Science and Technology)
基金
北京市教育委员会科技计划重点项目(KZ201110005005)
国家自然科学基金项目(61072089)