In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision lev...In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.展开更多
近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,...近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XD-Violence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TG-BCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.展开更多
Action recognition is an important topic in computer vision. Recently, deep learning technologies have been successfully used in lots of applications including video data for sloving recognition problems. However, mos...Action recognition is an important topic in computer vision. Recently, deep learning technologies have been successfully used in lots of applications including video data for sloving recognition problems. However, most existing deep learning based recognition frameworks are not optimized for action in the surveillance videos. In this paper, we propose a novel method to deal with the recognition of different types of actions in outdoor surveillance videos. The proposed method first introduces motion compensation to improve the detection of human target. Then, it uses three different types of deep models with single and sequenced images as inputs for the recognition of different types of actions. Finally, predictions from different models are fused with a linear model. Experimental results show that the proposed method works well on the real surveillance videos.展开更多
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+2 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the Fundamental Research Funds for the Central Universities No. FRF-TP-14-046A2the first-class discipline construction transitional funds of Minzu University of China
文摘In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.
文摘近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XD-Violence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TG-BCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.
文摘Action recognition is an important topic in computer vision. Recently, deep learning technologies have been successfully used in lots of applications including video data for sloving recognition problems. However, most existing deep learning based recognition frameworks are not optimized for action in the surveillance videos. In this paper, we propose a novel method to deal with the recognition of different types of actions in outdoor surveillance videos. The proposed method first introduces motion compensation to improve the detection of human target. Then, it uses three different types of deep models with single and sequenced images as inputs for the recognition of different types of actions. Finally, predictions from different models are fused with a linear model. Experimental results show that the proposed method works well on the real surveillance videos.