基于深层说话人矢量的说话人检索

Speaker retrieval based on deep speaker vector

导出

摘要为了解决浅层特征不能有效刻画说话人特征,导致说话人检索率不高的问题,提出了一种基于深层说话人矢量的说话人检索方法.使用受限波尔兹曼机逐层构建一个多层的深层特征提取器用以提取说话人深层特征.为说话人构建基于深层特征的深层说话人矢量.通过计算要检索的说话人的深层说话人矢量和检索库中的说话人深层特征之间的最小距离,对目标说话人进行检索.实验结果表明:在深层特征下,使用深层说话人矢量可以检索到绝大部分的目标说话人;随着深度层数的增加,检索率先增后减,检索率最高对应的深度层数是7;随着深度层数的增加,检索时间非线性增加. In order to solve the problem that shallow feature can not depict speakers effectively,spearker reterieval rate is low,a method of speaker retrieval was proposed based on deep speaker vectors.Firstly,a multi layers deep feature extractor was constructed by using restriced boltzmann machines（RBM）training layer by layer to extract speaker deep feature.Secondly,deep speaker vectors were built.Lastly,object speaker was retrieved by calculating the minimal distance between deep speaker vectors of retrieval speaker and deep feature of speakers in retrieval library.Experimental results demonstrate that under deep feature,most of speakers can be retrieved using deep speaker vectors.Retrieval rate of the first and second layer are lower than mel-frequency cepstral coefficial（MFCC）and the third layer is the same as MFCC.Retrieval rate increases firstly and decreases later with the increasing of the depth of layers,and the highest retrieval rate corresponding to depth layers is 7.Retrieval time increases non-linearly with deep layer increasing.

作者李威杨继臣贺前华李艳雄

机构地区华南理工大学电子与信息学院

出处《华中科技大学学报（自然科学版）》 EI CAS CSCD 北大核心 2015年第7期62-65,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)

基金国家自然科学基金资助项目(61301300) 中国博士后科学基金资助项目(2013M531850) 中央高校基本科研业务费资助项目(2013ZM0097)

关键词深层特征深层说话人矢量最小距离说话人检索检索率 deep feature deep speaker vector minimal distance speaker retrieval retrieval rate

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1Yang Y,Nie F.A multimedia retrieval framework based on semi-supervised ranking and relevance feedback[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(4):723-742.
2Rougui J E,Gelgon M.Organizing Gaussian mixture models into a tree for scaling up for speaker retrieval[J].Pattern Recognition Letters,2007,28:1314-1319.
3Yang J,Li Y.Speaker retrieval based on minimum distance in HCT[C]∥Proc of International Conference on Wireless Mobile and Computing.Shanghai:IET,2011:274-277.
4Kwon S,Narayanan S.Unsupervised speaker indexing using generic models[J].IEEE Transaction on Speech and Audio Processing,2005,13(5):1004-1013.
5Nishiada M,Kawahara T.Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing[J].IEEE Transaction on Speech and Audio Processing,2005,13(4):583-592.
6孙志军,薛磊,许阳明,王正.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810. 被引量：602
7Hinton G E,Salakhutdinov R.Reducing the dimensionality of data with neural networks[J].Science,2006,313:4-7.
8Mohamed A,Sainath T N,Dahl G.Deep belief networks using discriminative features for phone recognition[C]∥Proceedings of IEEE Conference on Acoustic,Speech and Signal Processing,Vancouver:IEEE,2011:5060-5063.
9Dahl G E,Yu D,Deng L,et al.Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(1):30-42.
10Cai Meng,Zhang Weiqiang,Liu Jia.Improving deep neural network acoustic models using unlabeled data[C]∥Proceeding of IEEE China Summit and International Conference on Signal and Information Processing.Beijing:IEEE,2013:137-141.

二级参考文献57

1BENGIO Y, DELALLEAU O. On the expressive power of deep archi- tectures[ C ]//Proc of the 14th International Conference on Discovery Science. Berlin : Springer-Verlag, 2011 : 18 - 36.
2BENGIO Y. Leaming deep architectures for AI[ J]. Foundations and Trends in Machine Learning ,2009,2 ( 1 ) : 1-127.
3HINTON G,OSINDERO S,TEH Y. A fast learning algorithm for deep belief nets [ J ]. Neural Computation ,2006,18 (7) : 1527-1554.
4BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks [ C ]//Proc of the 12th Annual Conference on Neural Information Processing System. 2006:153-160.
5LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning ap- plied to document recognition[ J]. Proceedings of the iEEE, 1998, 86( 11 ) :2278-2324.
6VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[ C ]//Proc of the 25th International Conference on Machine Learning. New York: ACM Press ,2008 : 1096-1103.
7VINCENT P, LAROCHELLE H, LAJOIE I, et aL Stacked denoising autoencoders:learning useftd representations in a deep network with a local denoising criterion [ J ]. Journal of Machine Learning Re- search ,2010,11 ( 12 ) :3371-3408.
8YU Dong, DENG Li. Deep convex net: a scalable architecture for speech pattern classification [ C]//Proc of the 12th Annual Confe-rence of International Speech Comunication Association. 2011 : 2285- 2288.
9POON H, DOMINGOS P. Sum-product networks:a new deep architec- ture[ C ]//Proc of IEEE Intemational Conference on Computer Vi- sion. 2011:689-690.
10BENGIO Y,LECUN Y. Scaling learning algorithms towards AI[ M]// BOTTOU L,CHAPELLE O, DeCOSTE D,et al. Large-Scale Kernel Machines. Cambridge: MIT Press ,2007:321-358.

共引文献1170

1吴秀平,赵咏梅,凌静思.幼儿游戏深度学习行为的支持体系建构[J].教育科学论坛,2024(2):63-66.
2贾彦哲.论人工智能研发者过失犯的注意义务[J].华中师范大学研究生学报,2020(2):40-46.
3毕思文,Henri Jaffrès,Chandra Sekhar Roychoudhuri.量子遥感发展新态势——世界首次量子遥感国际会议评述[J].全球变化数据学报（中英文）,2019,3(4):317-325. 被引量：1
4高嵩.深度学习在机场能见度预测中的应用[J].计算机产品与流通,2020,0(4):260-260. 被引量：2
5张永玲.以深度学习为核心的小学数学有效教学策略研究[J].理科爱好者（教育教学版）,2019,0(5):248-249. 被引量：2
6范敏,胥小波,聂小明.基于字符级扩张卷积网络的Web攻击检测方法[J].计算机应用研究,2020,37(S02):234-237. 被引量：4
7孟威,尉永清,刘文锋.基于CRT机制混合神经网络的特定目标情感分析[J].计算机应用研究,2020,37(2):360-364. 被引量：2
8华夏,王新晴,马昭烨,王东,邵发明.基于递归神经网络的视频多目标检测技术[J].计算机应用研究,2020,37(2):615-620. 被引量：8
9周帆,陈晓蝶,钟婷,吴劲.面向金融科技的深度学习技术综述[J].计算机科学,2022,49(S02):20-36. 被引量：3
10李灿强,夏志方,丁邡.基于人工智能技术的“数字政府”研究[J].中国经贸导刊,2019(5Z):138-139. 被引量：6

1毛二松,陈刚,刘欣,王波.基于深层特征和集成分类器的微博谣言检测研究[J].计算机应用研究,2016,33(11):3369-3373. 被引量：23
2朱宪莹,刘箴,金炜,刘婷婷,刘翠娟,柴艳杰.基于特征融合的层次结构微博情感分类[J].电信科学,2016,32(7):106-114. 被引量：6
3付中华,张艳宁.在线无监督说话人检索中稳健的模型自举算法[J].软件学报,2007,18(3):608-616. 被引量：3
4丁美昆,徐昱琳,蒋财军.深度信念网络研究综述[J].工业控制计算机,2016,29(4):80-81. 被引量：6
5吴进,严辉,王洁.采用局部二值模式与深度信念网络的人脸识别[J].电讯技术,2016,56(10):1119-1123. 被引量：10
6俞振利,张礼和.从任意连续语音中实时提取说话人特征及三维显示[J].杭州大学学报（自然科学版）,1992,19(4):390-397.
7宋凌.基于主成分分析的说话人特征变换研究[J].电子技术与软件工程,2013(17):241-243. 被引量：1
8雷雪梅,王大亮,田中贵秋,曾广平.基于深层特征抽取的日文词义消歧系统[J].北京科技大学学报,2010,32(2):263-269. 被引量：1
9唐有宝,卜巍,邬向前.多层次MSER自然场景文本检测[J].浙江大学学报（工学版）,2016,50(6):1134-1140. 被引量：10
10胡侯立,魏维,胡蒙娜.深度学习算法的原理及应用[J].信息技术,2015,39(2):175-177. 被引量：20

华中科技大学学报（自然科学版）

2015年第7期

浏览历史

内容加载中请稍等...

基于深层说话人矢量的说话人检索

参考文献17

二级参考文献57

共引文献1170

相关作者

相关机构

相关主题

浏览历史