摘要
作为多媒体媒质之一的音频信号蕴涵了丰富的视觉听觉语义,但是目前多媒体检索主要利用的是视觉信息,音频信息被忽略。为了弥补这一不足,本文介绍了一个音频语义检索原型系统,在这个系统中,音频信号被分层次处理:首先分析音频信息中的短时能量、过零率和基本频率能量比等特征,音频信息流被接层次粗分为静音、和谐音乐、对话和环境背景音四类;由于环境背景音蕴涵了大量语义,环境背景音被继续细分,井用训练好的隐马尔可夫链表示每类环境背景音以进行语义检索。实验数据表明,这样的音频查询处理方式取得了良好效果。
As one component in multimedia, audio contains rich audiovisual semantic information. However, current multimedia retrieval mostly uses visual information without audio information. In this paper an audio semantic retrieval prototype system is presented, in which audio stream is hierarchically handled. First, depending on audio characteristics such as short-time energy, zero-crossing rate and fundamental frequency energy ratio, audio stream is coarsely segmented into four basic classes: silence, harmonic music, dialog and environmental sounds. Then, hidden Markov model (HMM) is used to perform fine-level segmentation for environmental sounds which have mary implied semantics. At the same time, the trained HMM is used to denote each type of environmental sound for semantic retrieval. Experimental data show this audio retrieval method works well.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2001年第1期104-108,共5页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金
教育部优秀年轻教师基金
高等学校骨干教师资助计划资助项目