摘要
针对视频分割中底层特征与高层语义之间的"语义鸿沟"问题,提出了一种基于多模态融合和镜头间竞争力的场景分割算法,对视频帧的图像、文本、音频等模态进行特征提取,用欧式距离、余弦距离计算出同种模态数据的相似性,用典型相关分析法计算出不同模态数据的相关度,分别对各模态数据的相似性和相关度进行融合得到镜头之间的相似度和相关度,采用镜头间竞争力的方法分别对相似镜头和相关镜头进行场景分割并对分割出的两个场景边界集合取交集得到最终的场景边界,从而实现对视频的场景分割。实验结果表明,该方法在场景分割中具有较高的性能,查全率和查准率分别达到82.1%和86.7%。
To solve the problem of"semantic gap"between low-level features and high-level semantic in video scene seg-mentation, an algorithm of video scene segmentation was put forward based on multimodal feature fusion and competition.The im-age, text and audio features were abstracted as the low-level features of the video frame.Euclidean distance, cosine similarity distance were used to calculate the similarity of homogeneous data, and the method of canonical correlation analysis was used to calculate the heterogeneous data correlation, respectively.The shot similarity and shot relevance were obtained by similarity fu-sion and correlation fusion.Then a competition analysis of splitting and merging forces for scene segmentation was adopted.The final scene was obtained by take the intersection of two segmented scenarios border sets.Thus the video scene segmentation was realized.The results of experiments show that the video scene can be effectively separated by the proposed method, and the recall ratio, precision reached 82.1%and 86.7%respectively.
出处
《武汉理工大学学报(信息与管理工程版)》
CAS
2014年第6期759-763,共5页
Journal of Wuhan University of Technology:Information & Management Engineering
基金
湖北省自然科学基金资助项目(2009Chb008
2010CDB06603)
湖北省教育厅重点科研基金资助项目(D20101703)
关键词
竞争力
多模态融合
相似性度量
典型相关性
场景分割
competition
multi-modality
similarity measurement
canonical correlation
scene segmentation