摘要
本文提出在压缩域上直接对MPEG音频信号进行分析,达到电视节目实时分析检索目的.算法分为三步:首先利用压缩域特征对音频信号进行分割,然后应用分层方法把分割出来的音频片段粗分成音乐、语音和其它三个基本类别;由于话者身份是语音信号中的重要检索线索,最后利用隐马尔可夫链实现了与文本无关的话者识别,并用识别出来的话者身份对语音信号和其相应的视频进行标注.
In order to perform real-time TV program analysis and retrieval, this paper presents to directly deal with MPEG multimedia stream using compressed features. The algorithm consists of three steps: first the MPEG audio stream is segmented using compressed features; then the segmented clips are hierarchically coarse-grained classified into three basic classes, i.e. music, speech and others; since speaker identity is an important cue for multimedia retrieval, HMM is used to implement recognition of text-independent speaker, the identified speaker identity is used to label audio speech and corresponding video.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2002年第1期21-27,共7页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金(69803009
69733030)
教育部优秀年轻教师基金
高等学校骨干教师资助计划