期刊文献+

结合有监督联合一致性自编码器的跨音视频说话人标注 被引量:2

Efficient Audio-visual Cross-modal Speaker Tagging via Supervised Joint Correspondence Auto-encoder
下载PDF
导出
摘要 跨模态说话人标注旨在利用说话人的不同生物特征进行相互匹配和互标注,可广泛应用于各种人机交互场合。针对人脸和语音两种不同模态生物特征之间存在明显的"语义鸿沟"问题,该文提出一种结合有监督联合一致性自编码器的跨音视频说话人标注方法。首先分别利用卷积神经网络和深度信念网络分别对人脸图像和语音数据进行判别性特征提取,接着在联合自编码器模型的基础上,提出一种新的有监督跨模态神经网络模型,同时嵌入softmax回归模型以保证模态间和模态内样本的相似性,进而扩展为3种有监督一致性自编码器神经网络模型来挖掘音视频异构特征之间的潜在关系,从而有效实现人脸和语音的跨模态相互标注。实验结果表明,该文提出的网络模型能够有效的对说话人进行跨模态标注,效果显著,取得了对姿态变化和样本多样性的鲁棒性。 Cross-modal speaker tagging aims to learn the latent relationship between different biometrics for mutual annotation, which can potentially be utilized in various human-computer interactions. In order to solve the "semantic gap" between the face and audio modalities, this paper presents an efficient supervised joint correspondence auto-encoder to link the face and audio counterpart, where by the speaker can be crosswise tagged. First, Convolutional Neural Network (CNN) and Deep Belief Network (DBN) are used to extract the discriminative features of the face and the audio samples respectively. Then, a supervised neural network model associated with softmax regression is embedded into a joint auto-encoder model, which can discriminatively preserving the inter-modal and intra-modal similarities. Accordingly, three different kinds of supervised joint correspondence auto-encoder models are presented to correlate the semantic relationships between the face and the audio counterparts, and the speaker can be crosswise annotated efficiently. The experimental results show that the proposed supervised joint auto-encoder is able to perform cross-modal speaker tagging with outstanding performance, and demonstrate the robustness to facial posture variations and sample diversities.
作者 柳欣 李鹤洋 钟必能 杜吉祥 LIU Xin;LI Heyang;ZHONG Bineng;DU Jixiange(Institute of Computer Science and Technology, Huaqiao University, Xiamen 361021, China;Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Xiamen 361021, China)
出处 《电子与信息学报》 EI CSCD 北大核心 2018年第7期1635-1642,共8页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61673185 61572205 61673186) 福建省自然科学基金(2017J01112) 华侨大学中青年创新人才培育项目(ZQN-309)~~
关键词 跨模态说话人标注 有监督联合自编码器 softmax回归模型 有监督神经网络模型 Cross-modal speaker tagging Supervised joint correspondence auto-encoder Softmax regression Supervised neural network model
  • 相关文献

参考文献2

二级参考文献25

  • 1Campbell W M, Sturim D E, and Reynolds D A, et al.. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation [C]. Proc ICASSP 2006, Toulouse, France. 2006, Vol. 1: 97-100.
  • 2Solomonoff A, Campbell W M, and Boardman I. Advances in channel compensation for SVM speaker recognition [C]. Proc. ICASSP 2005, Philadelphia, USA. 2005, Vol. 1: 629-632.
  • 3Reynolds D A, Quatieri T F, and Dunn R, B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10(3): 19-41.
  • 4Kenny P, Boulianne G, Ouellet P, and Dumouchel P. Speaker and session variability in GMM-based speaker verification [J]. IEEE Trans. on Audio, Speech and Language Processing, 2007, 15(4): 1448-1460.
  • 5Vogt R, Baker B, and Sridharan S. Modeling session variability in text-independent speaker verification [C]. Proc. Interspeech2005, Lisbon, Portugal. 2005: 3117-3120.
  • 6Kenny P, Mihoubi M, and Dumouchel P. New MAP estimators for speaker recognition [C]. Proc. Eurospeech 2003, Geneva, Switzerland, 2005: 2964-2967.
  • 7Kenny P, Boulianne G, and Dumouchel P. Eigenvoice modeling with sparse training data [J]. IEEE Trans. on Speech and Audio, 2005, 13(3): 345-354.
  • 8Collobert R. SVMTorch: A support vector machine for large-scale regression and classification problems[EB/OL]. Available at: http://bengio.abracadoudou.com/projects/ SVMTorch.htm].
  • 9NIST, The NIST Year 2006 speaker recognition evaluation plan[EB/OL]. Available at: http://www.nist.gov/speech /tests/spk/2006/sre-06_ evalplan-v9.pdf.
  • 10Matejka P, Burget L, and Schwarz P, et al.. STBU system for the NIST 2006 speaker recognition evaluation. Proc. ICASSP 2007, Hawaii, USA. 2007, Vol. 4: 221-224.

共引文献7

同被引文献30

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部