基于跨模态的无监督影视剧说话人识别

CROSS-MODAL-BASED UNSUPERVISED SPEAKER RECOGNITION IN FILM AND TV DRAMA

下载PDF

导出

摘要现如今,影视剧的海量增长给其有效管理带来了巨大挑战,而其中的角色识别在影视剧内容管理中具有重大意义。传统的角色识别主要采用依赖于训练样本质量的有监督学习,而现实中一般难以获得充足的训练样本。针对影视剧中的角色识别,提出一种跨模态的无监督说话人识别方法:首先基于声学特征和时间近邻性的音频聚类获得对应聚类结果的音频标记序列;然后通过剧本解析获得对应说话人、说话内容、说话时间的文本标记序列;接着将音频序列与文本序列进行跨模态序列匹配,构造满射解出最小编辑距离,从而实现说话人识别。实验结果表明,在训练集较少的情况下该方法比有监督方法具有更高识别率。 Nowadays the explosive growth of film and TV dramas bring great challenges to their effective management,and in which the role recognition is of great significance in film and TV drama content management. Traditional role recognition mainly depends on the supervised learning of training sample quality,however in reality it is difficult to gain sufficient training samples. This paper proposes an unsupervised speaker recognition method which is based on cross-modal aiming at role recognition in films and TV dramas. The steps are as follows： First,based on acoustic features and audio clustering of time proximity we obtain the audio marking sequence of corresponding clustering result. Secondly,through scripts parsing we obtain the text marking sequence of corresponding speaker,speaking contents and speaking time. Finally we make cross-modal sequence alignment of these two sequences and construct the surjection to calculate minimum Levenshtein distance,so as to achieve speaker recognition. Experimental results show that under the circumstance of sparse training data sets this method has higher recognition rate than the supervised method.

作者冯骋库天锡杨卫星李雪蒙谭小琼梁超

机构地区武汉大学国家多媒体工程技术研究中心武汉大学计算机学院

出处《计算机应用与软件》 CSCD 2016年第5期132-135,147,共5页 Computer Applications and Software

基金国家自然科学基金重点项目(61231015)

关键词说话人识别说话人聚类编辑距离混合高斯模型序列匹配 Speaker recognition Speaker clustering Levenshtein distance Gaussian mixture model Sequence alignment

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献16

1Alam Md,Jahangir,Kenny,et al.Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems[J].cognitive computation,2013,5(4):533-544.
2Mahmood Awais,Alsulaiman Mansour,Muhammad Ghulam.Automatic Speaker Recognition Using Multi-Directional Local Features(MDLF)[J].Arabian journal for science and engineering,2014,39(5):3799-3811.
3Jourani Reda,Daoudi Khalid,Andre-Obrecht Regine.Discriminative speaker recognition using large margin GMM[J].Neural computing&applications,2013,22(7):1329-1336.
4Ji Zhe,Hou Wei,Jin Xin.Duration Weighted Gaussian Mixture Model Supervector Modeling for Robust Speaker Recognition[C]//2013Ninth International Conference on Natural Computation(ICNC2013).Shenyang:IEEE,2013:238-241.
5Ling Xinxing,Zhan Ling,Hong Zhao,et al.Speaker recognition system using the improved GMM-based clustering algorithm[C]//2010 International Conference on Intelligent Computing and Integrated Systems(ICISS2010).Gandhinagar:Springer,2010:482-485.
6Khoury E,Vesnicer B,Franco-Pedroso,et al.The 2013 Speaker Recognition Evaluation in Mobile Environment[C]//2013 International Conferences on Biometrics(ICB2013).Madrid:IEEE,2013.
7Hori Takaaki,Araki Shoko,Yoshioka,et al.Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera[J].IEEE Transactions on audio speech and processing,2013,20(2):499-513.
8Berg T L,Berg A C,Edwards J,et al.Names and faces in the news[C]//Computer Vision and Pattern Recognition(CVPR).Washington D.C:IEEE,2004:848-854.
9Guillaumin M,Mensink T,Verbeek J,et al.Automatic face naming with caption-based supervision[C]//IEEE Conference on Computer Vision and Pattern Recognition.Anchorage:IEEE,2008:2030-2037.
10Ozkan D,Duygulu P.Interesting faces:A graph-based approach for finding people in news[J].Pattern Recognition,2010:43(5):1717-1735.

1康健辉,吴渝,郑继明.基于向量空间模型的改进音频分类算法[J].河南师范大学学报（自然科学版）,2008,36(6):30-33.
2邢燕,刘卫江.一般关系下粗糙集空间上另一类映射的性质研究[J].辽宁工学院学报,2005,25(2):134-136.
3张素敏,苏东林,王炜.改进的基于决策树的说话人在线聚类[J].光学精密工程,2010,18(1):227-233. 被引量：1
4成新民,张迎,蒋云良.基于FVQMM的说话人识别[J].辽宁工程技术大学学报（自然科学版）,2007,26(5):719-722.
5汪世杰.五笔字型输入技术中字根编码的建立是一个满射[J].九江师专学报,1997,16(5):6-7.
6方晶,朱嘉钢,陆晓.小空间占用的说话内容与说话者群同时识别[J].计算机应用研究,2015,32(1):156-160.
7肖述才,欧智坚,王作英.语音识别中的一种说话人聚类算法[J].中文信息学报,2005,19(4):84-88. 被引量：4
8明安龙,马华东,傅慧源.多摄像机监控中基于贝叶斯因果网的人物角色识别[J].计算机学报,2010,33(12):2378-2386. 被引量：9
9韩国栋,王嘉祯.一种面向移动Agent的多任务并行计算模型及算法[J].军械工程学院学报,2006,18(4):64-66.
10徐秀芳,徐森,徐静,安晶.基于谱聚类算法的音频聚类研究[J].软件导刊,2016,15(11):36-38. 被引量：1

计算机应用与软件

2016年第5期

浏览历史

内容加载中请稍等...

基于跨模态的无监督影视剧说话人识别

参考文献16

相关作者

相关机构

相关主题

浏览历史