期刊文献+

一种新的基于三维卷积共生梯度直方图和多示例学习的特殊视频检测算法 被引量:7

A New Special Video Detection Algorithm Based on 3D Convolution CoHOG and MIL
下载PDF
导出
摘要 已有的基于梯度方向直方图信息的视频内容检测算法侧重在二维的视频帧上提取特征,忽略了视频内容在时间维度上的相关性.提取局部梯度间潜在的共生关系特征可一定程度上提高算法的检测准确率;同时,对相邻特征池化可有效减少特征降维过程中的信息丢失.基于此,利用视频帧间结构信息通过卷积运算构建共生梯度直方图的三维结构,然后对相邻特征池化实现描述特征的有效降维,解决了忽略帧间信息影响识别准确率以及高维度特征难以训练的问题;将视频特征映射到多示例学习中的示例和包,非常容易地实现了对不同长度视频的检测.在公开测试数据集Hockey、Movie上进行测试,实验结果显示,Hockey数据集上算法的检测准确率高于现有最优算法3%,Movie数据集上的检测准确率高于现有最优算法0.5%,验证了新特征与算法的有效性. Existing video content detection algorithms based on gradient direction histogram information are focused on the features extracted from the single two-dimensional video frames,ignored the correlation of the video frames on the time dimension.The frames in the video are inseparable whole.All consecutive frames could express true and complete semantics.The extracted information contained in video is inaccurate if only consider key frames.The correlation contain semantic information of video,is import for video content detection.And the potential symbiotic relationship between local gradient direction features is beneficial to the improvement of the algorithm accuracy.Just as important,pooling used in the adjacent features can reduce high-dimensional feature dimension,avoid losing hidden action information.Constructed3D Conv-CoHOG feature by using the hidden structure information in video frames on the time dimension,and extending two-dimensional CoHOG features to three-dimensional features.Pooling operation on neighboring features reduced feature dimension effectively.This algorithm solved the problems of recognition accuracy reduction because of the inter-frame information neglect and the high computing complexity caused by high-dimensional features.Mapping video features to instances and bags corresponding to multiple-instance learning,dealing with video content detection problems for different lengths of videos simply.In this article,we introduced field of research and the importance of video violence content detection firstly.Then summarized the achievements of previous research,classified the findings of the research.All algorithms are divided into3categories,based on multi-modal features of audio and video and fused color feature,based on fusion of different action features,and the content detection algorithm based on neural network and unsupervised feature extraction.The most important part of this article is the introduction of algorithmic structure.We introduced the concept of HOG features and the extraction process,compared the extraction difference between HOG,CoHOG and Conv-CoHOG,also compared the extraction difference between HOG and HOG3D,and proposed the new special video content detection algorithm3D convolution CoHOG extended from Conv-CoHOG.We compared the difference between the proposed new feature and the old features,such as computational dimension,feature dimension,and the relationship between adjacent features.In part 3.2,we introduced the framework of the new algorithm.In part 3.3 to part 3.7,we introduced the construction of feature extraction unit,the quantization of three dimensional gradients,extraction of Co-HOG3D,extraction of Conv-CoHOG3D,and the training of multiple-instance learning algorithm model.In part 4.1,described the two databases used in this experiment.In part 4.2,showed parameter setting and evaluation criteria.Then we analyzed the experimental results.In stage of training data,we used three classifiers,each classifier has a variety of implementations.When testing,compared the results of different features,analyzed the reasons for the different results,and analyzed the effectiveness of the new feature.In the end,we put forward effective solution on special video content detection.The highest detection accuracy on hockey and movie sets illustrated the availability of the proposed new algorithm on the special video detection.3% higher than the existing optimal algorithm on Hockey data set,0.5% higher than the existing optimal algorithm on Movie data set.
作者 宋伟 任栋 于京 齐振国 SONG Wei;REN Dong;YU Jing;QI Zhen-Guo(School of Information Engineering, Minzu University of China, Beijing 100081;School of Electronic Information Engineering, Beijing Jiaotong University, Beijing 100044)
出处 《计算机学报》 EI CSCD 北大核心 2019年第1期149-163,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61503424 61331013) 国家留学基金 中央民族大学一流大学一流学科("图像工程") 中央民族大学青年教师科研能力提升计划项目资助~~
关键词 视频内容检测 梯度方向直方图 多示例学习 卷积 池化 极限学习机 special videos detection histogram of oriented gradient multiple instance learning convolution pooling extreme learning machine
  • 相关文献

参考文献1

二级参考文献2

共引文献7

同被引文献74

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部