期刊文献+

基于深层说话人矢量的说话人检索

Speaker retrieval based on deep speaker vector
原文传递
导出
摘要 为了解决浅层特征不能有效刻画说话人特征,导致说话人检索率不高的问题,提出了一种基于深层说话人矢量的说话人检索方法.使用受限波尔兹曼机逐层构建一个多层的深层特征提取器用以提取说话人深层特征.为说话人构建基于深层特征的深层说话人矢量.通过计算要检索的说话人的深层说话人矢量和检索库中的说话人深层特征之间的最小距离,对目标说话人进行检索.实验结果表明:在深层特征下,使用深层说话人矢量可以检索到绝大部分的目标说话人;随着深度层数的增加,检索率先增后减,检索率最高对应的深度层数是7;随着深度层数的增加,检索时间非线性增加. In order to solve the problem that shallow feature can not depict speakers effectively,spearker reterieval rate is low,a method of speaker retrieval was proposed based on deep speaker vectors.Firstly,a multi layers deep feature extractor was constructed by using restriced boltzmann machines(RBM)training layer by layer to extract speaker deep feature.Secondly,deep speaker vectors were built.Lastly,object speaker was retrieved by calculating the minimal distance between deep speaker vectors of retrieval speaker and deep feature of speakers in retrieval library.Experimental results demonstrate that under deep feature,most of speakers can be retrieved using deep speaker vectors.Retrieval rate of the first and second layer are lower than mel-frequency cepstral coefficial(MFCC)and the third layer is the same as MFCC.Retrieval rate increases firstly and decreases later with the increasing of the depth of layers,and the highest retrieval rate corresponding to depth layers is 7.Retrieval time increases non-linearly with deep layer increasing.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2015年第7期62-65,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(61301300) 中国博士后科学基金资助项目(2013M531850) 中央高校基本科研业务费资助项目(2013ZM0097)
关键词 深层特征 深层说话人矢量 最小距离 说话人检索 检索率 deep feature deep speaker vector minimal distance speaker retrieval retrieval rate
  • 相关文献

参考文献17

  • 1Yang Y,Nie F.A multimedia retrieval framework based on semi-supervised ranking and relevance feedback[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(4):723-742.
  • 2Rougui J E,Gelgon M.Organizing Gaussian mixture models into a tree for scaling up for speaker retrieval[J].Pattern Recognition Letters,2007,28:1314-1319.
  • 3Yang J,Li Y.Speaker retrieval based on minimum distance in HCT[C]∥Proc of International Conference on Wireless Mobile and Computing.Shanghai:IET,2011:274-277.
  • 4Kwon S,Narayanan S.Unsupervised speaker indexing using generic models[J].IEEE Transaction on Speech and Audio Processing,2005,13(5):1004-1013.
  • 5Nishiada M,Kawahara T.Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing[J].IEEE Transaction on Speech and Audio Processing,2005,13(4):583-592.
  • 6孙志军,薛磊,许阳明,王正.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810. 被引量:602
  • 7Hinton G E,Salakhutdinov R.Reducing the dimensionality of data with neural networks[J].Science,2006,313:4-7.
  • 8Mohamed A,Sainath T N,Dahl G.Deep belief networks using discriminative features for phone recognition[C]∥Proceedings of IEEE Conference on Acoustic,Speech and Signal Processing,Vancouver:IEEE,2011:5060-5063.
  • 9Dahl G E,Yu D,Deng L,et al.Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(1):30-42.
  • 10Cai Meng,Zhang Weiqiang,Liu Jia.Improving deep neural network acoustic models using unlabeled data[C]∥Proceeding of IEEE China Summit and International Conference on Signal and Information Processing.Beijing:IEEE,2013:137-141.

二级参考文献57

  • 1BENGIO Y, DELALLEAU O. On the expressive power of deep archi- tectures[ C ]//Proc of the 14th International Conference on Discovery Science. Berlin : Springer-Verlag, 2011 : 18 - 36.
  • 2BENGIO Y. Leaming deep architectures for AI[ J]. Foundations and Trends in Machine Learning ,2009,2 ( 1 ) : 1-127.
  • 3HINTON G,OSINDERO S,TEH Y. A fast learning algorithm for deep belief nets [ J ]. Neural Computation ,2006,18 (7) : 1527-1554.
  • 4BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks [ C ]//Proc of the 12th Annual Conference on Neural Information Processing System. 2006:153-160.
  • 5LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning ap- plied to document recognition[ J]. Proceedings of the iEEE, 1998, 86( 11 ) :2278-2324.
  • 6VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[ C ]//Proc of the 25th International Conference on Machine Learning. New York: ACM Press ,2008 : 1096-1103.
  • 7VINCENT P, LAROCHELLE H, LAJOIE I, et aL Stacked denoising autoencoders:learning useftd representations in a deep network with a local denoising criterion [ J ]. Journal of Machine Learning Re- search ,2010,11 ( 12 ) :3371-3408.
  • 8YU Dong, DENG Li. Deep convex net: a scalable architecture for speech pattern classification [ C]//Proc of the 12th Annual Confe-rence of International Speech Comunication Association. 2011 : 2285- 2288.
  • 9POON H, DOMINGOS P. Sum-product networks:a new deep architec- ture[ C ]//Proc of IEEE Intemational Conference on Computer Vi- sion. 2011:689-690.
  • 10BENGIO Y,LECUN Y. Scaling learning algorithms towards AI[ M]// BOTTOU L,CHAPELLE O, DeCOSTE D,et al. Large-Scale Kernel Machines. Cambridge: MIT Press ,2007:321-358.

共引文献1170

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部