期刊文献+

基于动态贝叶斯网络的音视频联合说话人跟踪 被引量:7

Audio-visual Speaker Tracking Based on Dynamic Bayesian Network
下载PDF
导出
摘要 将多传感器信息融合技术用于说话人跟踪问题,提出了一种基于动态贝叶斯网络的音视频联合说话人跟踪方法.在动态贝叶斯网络中,该方法分别采用麦克风阵列声源定位、人脸肤色检测以及音视频互信息最大化三种感知方式获取与说话人位置相关的量测信息;然后采用粒子滤波对这些信息进行融合,通过贝叶斯推理实现说话人的有效跟踪;并运用信息熵理论对三种感知方式进行动态管理,以提高跟踪系统的整体性能.实验结果验证了本文方法的有效性. Multi-sensor data fusion technique is applied to speaker tracking problem, and a novel audio-visual speaker tracking approach based on dynamic Bayesian network is proposed. Based on the complementarity and redundancy between speech and image of a speaker, three kinds of perception methods, including sound source localization based on microphone array, face detection based on skin color information, and maximization mutual information based on audio-visual synchronization, are proposed to acquire the tracking information. In the framework of dynamic Bayesian network, particle filtering is used to fuse the tracking information, and perception management is achieved to improve the tracking efficiency by information entropy theory. Experiments using real-world data demonstrate that the proposed method can robustly track the speaker even in the presence of perturbing factors such as high room reverberation and video occlusions.
出处 《自动化学报》 EI CSCD 北大核心 2008年第9期1083-1089,共7页 Acta Automatica Sinica
基金 国家自然科学基金(60772161 60372082)资助~~
关键词 说话人跟踪 动态贝叶斯网络 粒子滤波 麦克风阵列 Speaker tracking, dynamic Bayesian network, particle filter, microphone array
  • 相关文献

参考文献15

  • 1Cheng C, Ansari R. Kernel particle filter for visual tracking. IEEE Signal Processing Letters, 2005, 12(3): 242-245.
  • 2Smaragdis P, Boufounos P. Position and trajectory learning for microphone arrays. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(1): 358-368.
  • 3Wang C, Griebel S, Brandstein M, Hsu B. Real-time automated video and audio capture with multiple cameras and microphones. Journal of VL SI Signal Processing Systems, 2001, 29(1-2): 81-99.
  • 4Wilson K, Rangarajan V, Checka N, Darrell T. Audiovisual arrays for untethered spoken interfaces. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces. Pittsburg, USA: IEEE, 2002. 389-394.
  • 5Wrigley S N, Brown G J. Physiologically motivated audiovisual localization and tracking. In: Proceedings of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal: Interspeech, 2005. 773-776.
  • 6Vermaak J, Gangnet M, Blake A, Perez P. Sequential Monte Carlo fusion of sound and vision for speaker tracking. In: Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, Canada: IEEE, 2001. 741-746.
  • 7Lo D, Goubran R A, Dansereau R M. Robust joint audiovideo talker localization in video conferencing using reliability information-Ⅱ: Bayesian network fusion. IEEE Transactions on Instrumentation and Measurement, 2005, 54(4): 1541-1547.
  • 8Huang J, Ohnishi N, Sugie N. Sound localization in reverberant environment based on the model of the precedence effect. IEEE Transactions on Instrumentation and Measurement, 1997, 46(4): 842-846.
  • 9Chen J D, Benesty J, Huang Y T. Time delay estimation in room acoustic environments: an overview. EURASIP Journal on Applied Signed Processing, 2006, 2006(12): 1-19.
  • 10Fashing M, Tomasi C. Mean shift is a bound optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(3): 471-474.

二级参考文献5

共引文献50

同被引文献105

引证文献7

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部