期刊文献+

基于声学融合特征的说话人分类方法研究

Research on Speaker Classification Method Based on Acoustic Merging Feature
下载PDF
导出
摘要 说话人分类系统的目的是将声音数据分段并按说话人进行分类。对每个说话人提取基于多距离麦克风的多时延特征,可以进一步提高说话人分类系统性能。但随着麦克风个数增加,多时延特征向量维数迅速增长。针对该问题,采用保留特征流形结构并降低计算代价的方法,提出一种基于多距离麦克风融合声学特征的多分量鉴别式保局投影算法,利用支持向量机分类器进行两说话人分类系统的训练和测试,实现会议场景下的说话人分类。实验结果证明,与传统DLPP等算法相比,该算法在大部分数据集上的分类性能较优,可将分类误差率降低至20%以下。 The purpose of the speaker classification system is to segment and classify speech data according to different speaker.It improves performance of the speaker classification system by extracting multi-delay feature based on multiple distance microphones.With the number of microphones increases,the multi-delay feature vector dimension grows rapidly.Aiming at this problem,a method is proposed with keeping manifold structure and reducing the computational cost.It uses the multi-component discriminant locality preserving projections algorithm based on multiple distance microphones acoustic merging feature.Experimental results show that Diarization Error Rate(DER) of this algorithm can be reduced to below 20% and is better than traditional methods in most of the data set.
出处 《计算机工程》 CAS CSCD 2013年第8期1-4,共4页 Computer Engineering
基金 国家自然科学基金资助项目(61105017) 北京市自然科学基金资助项目(KZ201110005005)
关键词 说话人分类 多距离麦克风 多时延特征 声学融合特征 多分量鉴别式保局投影 分类误差率 speaker classification multiple distance microphone multi-delay feature acoustic merging feature multi-component discriminant locality preserving projection Diarization Error Rate(DER)
  • 相关文献

参考文献11

  • 1The 2009(RT-09)Rich Transcription Meeting RecognitionEvaluation Plan[EB/OL].(2009-10-21).http://nist.gov/speech/tests/rt/2009.
  • 2王炜,吕萍,颜永红.一种改进的基于层次聚类的说话人自动聚类算法[J].声学学报,2008,33(1):9-14. 被引量:4
  • 3施剑,何成林,杜利民.基于USB2.0的麦克风阵列语音数据采集系统设计[J].计算机工程,2006,32(24):216-218. 被引量:3
  • 4Togneri R,Alder M,Attikiouzel J.Dimension and Structure ofthe Speech Space[J].Communications,Speech and Vision,1992,139(2):123-127.
  • 5Riemann B.über Die Hypothesen Welche Der Geometrie ZuGrunde Liegen[M].New York,USA:Dover Publications,1854.
  • 6Yu Weiwei,Teng Xiaolong,Liu Chongqing.Face RecognitionUsing Discriminant Locality Preserving Projections[J].Imageand Vision Computing,2006,24(3):239-248.
  • 7卢桂馥,王勇,金忠.快速的完备鉴别保局投影人脸识别算法[J].模式识别与人工智能,2011,24(6):804-809. 被引量:2
  • 8Yang Liping,Gong Weiguo,Gua Xiaohua,et al.Null SpaceDiscriminant Locality Preserving Projections for FaceRecognition[J].Neurocomputing,2008,71(16-18):3644-3649.
  • 9Benesty J,Chen Jingdong,Huang Yiteng.Microphone ArraySignal Processing[M].[S.1.]:Springer,2008:192-193.
  • 10Georgiou P G,Kyriakakis C,Tsakalides P.Robust Time DelayEstimation for Sound Source Localization in NoisyEnvironments[C]//Proc.of IEEE ASSP Workshop onApplications of Signal Processing to Audio and Acoustics.New York,USA:[s.n.],1997.

二级参考文献32

  • 1吕萍,颜永红.基于回归分析的语音识别快速自适应算法[J].声学学报,2005,30(3):222-228. 被引量:4
  • 2郭春霞,裘雪红.基于MFCC的说话人识别系统[J].电子科技,2005,18(11):53-56. 被引量:19
  • 3徐向华,朱杰,郭强.决策树结构对说话人自适应影响的研究[J].声学学报,2006,31(1):42-47. 被引量:3
  • 4胡文吉,王让定.基于小波包分析的特征参数提取[J].宁波大学学报(理工版),2007,20(1):51-54. 被引量:3
  • 5Quatieri T E.离散时间语音信号处理-原理与应用[M].赵胜辉,刘家康,谢湘,等,译.北京:电子工业出版社,2004.
  • 6Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd Edi- tion. New York,USA: John Wiley & Sons, 2000.
  • 7Belhumeur P N, Hespanha J P, Kriegman D J. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans on Pattern Analysis and Machine Intelligence, 1997, 19 (7) : 711 -720.
  • 8Chen Lifen, Liao Hongyuan, Ko M T, et al. A New LDA-Based Face Recognition System Which Can Solve the Small Sample Size Problem. Pattern Recognition, 2000, 33 (10) : 1713 - 1726.
  • 9Li Haifeng, Jiang Tao, Zhang Keshu. Efficient and Robust Feature Extraction by Maximum Margin Criterion. IEEE Trans on Neural Networks, 2006, 7 ( l ) : 157 - 165.
  • 10Roweis S T, Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 2000, 290 ( 5500 ) : 2323 - 2326.

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部