摘要
基于内容的音频流分割是多媒体数据分析领域中的一个十分重要和困难的问题.目前大多数传统的音频流分割方法是基于小尺度音频分类的,但是这类分割方法普遍存在虚假分割点过多的缺点,严重影响了实际应用的效果.作者的研究表明,大尺度音频片段的分类正确率要明显高于小尺度音频片段的分类正确率,并且这个趋势与分类器选择无关.基于这个事实和减少虚假分割点的目的,作者提出了一种新的音频流分割方法.首先,采用基于大尺度音频分类的分割方法对音频流进行粗分割,以减少虚假分割点;然后定义了分割点评价函数,并利用它在边界区域中进一步精确定位分割点.实验结果表明这种音频流分割方法可以比较精确地获取分割点位置,同时将虚假分割点减少到传统方法的四分之一.
Content-based audio segmentation plays an important role in multimedia applications.In order to segm audio classificat ent accurately and on-line, most conventional algorithms are based on small scale and always result in a high false segmentation rate. The authors'experimental results show that large-scale audio can be more easily classified than small ones, and this trend is irrespective of classifiers. According to this fact, this paper presents a novel framework for audio segmentation to reduce the false segmentations. First, a rough segmentation step based on largescale audio classification is taken to ensure the integrality of the content of audio segments, which can avoid the consecutive audio belonging to the same kind being segmented into different pieces.Then a subtle segmentation step based on segmentation point evaluation function is taken to further locate the segmentation points for the boundary areas computed by the rough segmentation step. Experimental results show that nearly 3/4 false segmentation points can be reduced comparing to the conventional audio segmentation method based on small-scale audio classification, while preserving a low missing rate.
出处
《计算机学报》
EI
CSCD
北大核心
2006年第3期457-465,共9页
Chinese Journal of Computers
基金
国家自然科学基金(60573060
60205002
60332010
60372020)
北京市自然科学基金(4042020)资助.
关键词
音频分类
音频流分割
分割点评价函数
虚假分割
神经网络
audio classification
audio segmentation
segmentation point evaluation function
false segmentation
neural network