Key frame extraction based on sparse coding can reduce the redundancy of continuous frames and concisely express the entire video.However,how to develop a key frame extraction algorithm that can automatically extract ...Key frame extraction based on sparse coding can reduce the redundancy of continuous frames and concisely express the entire video.However,how to develop a key frame extraction algorithm that can automatically extract a few frames with a low reconstruction error remains a challenge.In this paper,we propose a novel model of structured sparse-codingbased key frame extraction,wherein a nonconvex group log-regularizer is used with strong sparsity and a low reconstruction error.To automatically extract key frames,a decomposition scheme is designed to separate the sparse coefficient matrix by rows.The rows enforced by the nonconvex group log-regularizer become zero or nonzero,leading to the learning of the structured sparse coefficient matrix.To solve the nonconvex problems due to the log-regularizer,the difference of convex algorithm(DCA)is employed to decompose the log-regularizer into the difference of two convex functions related to the l1 norm,which can be directly obtained through the proximal operator.Therefore,an efficient structured sparse coding algorithm with the group log-regularizer for key frame extraction is developed,which can automatically extract a few frames directly from the video to represent the entire video with a low reconstruction error.Experimental results demonstrate that the proposed algorithm can extract more accurate key frames from most Sum Me videos compared to the stateof-the-art methods.Furthermore,the proposed algorithm can obtain a higher compression with a nearly 18% increase compared to sparse modeling representation selection(SMRS)and an 8% increase compared to SC-det on the VSUMM dataset.展开更多
Classic sparse representation, as one of prevalent feature learning methods, is successfully applied for different computer vision tasks. However it has some intrinsic defects in object detection. Firstly, how to lear...Classic sparse representation, as one of prevalent feature learning methods, is successfully applied for different computer vision tasks. However it has some intrinsic defects in object detection. Firstly, how to learn a discriminative dictionary for object detection is a hard problem. Secondly, it is usually very time-consuming to learn dictionary based features in a traditional exhaustive search manner like sliding window. In this paper, we propose a novel feature learning framework for object detection with the structure sparsity constraint and classification error minimization constraint to learn a discriminative dictionary. For improving the efficiency, we just learn sparse representation coefficients from object candidate regions and feed them to a kernelized SVM classifier. Experiments on INRIA Person Dataset and Pascal VOC 2007 challenge dataset clearly demonstrate the effectiveness of the proposed approach compared with two state-of-the-art baselines.展开更多
基金supported in part by the National Natural Science Foundation of China(61903090,61727810,62073086,62076077,61803096,U191140003)the Guangzhou Science and Technology Program Project(202002030289)Japan Society for the Promotion of Science(JSPS)KAKENHI(18K18083)。
文摘Key frame extraction based on sparse coding can reduce the redundancy of continuous frames and concisely express the entire video.However,how to develop a key frame extraction algorithm that can automatically extract a few frames with a low reconstruction error remains a challenge.In this paper,we propose a novel model of structured sparse-codingbased key frame extraction,wherein a nonconvex group log-regularizer is used with strong sparsity and a low reconstruction error.To automatically extract key frames,a decomposition scheme is designed to separate the sparse coefficient matrix by rows.The rows enforced by the nonconvex group log-regularizer become zero or nonzero,leading to the learning of the structured sparse coefficient matrix.To solve the nonconvex problems due to the log-regularizer,the difference of convex algorithm(DCA)is employed to decompose the log-regularizer into the difference of two convex functions related to the l1 norm,which can be directly obtained through the proximal operator.Therefore,an efficient structured sparse coding algorithm with the group log-regularizer for key frame extraction is developed,which can automatically extract a few frames directly from the video to represent the entire video with a low reconstruction error.Experimental results demonstrate that the proposed algorithm can extract more accurate key frames from most Sum Me videos compared to the stateof-the-art methods.Furthermore,the proposed algorithm can obtain a higher compression with a nearly 18% increase compared to sparse modeling representation selection(SMRS)and an 8% increase compared to SC-det on the VSUMM dataset.
基金Supported by the National Natural Science Foundation of China(61231015,61170023)National High Technology Research and Development Program of China(863 Program,2015AA016306)+3 种基金Internet of Things Development Funding Project of Ministry of Industry in 2013(No.25)Technology Research Program of Ministry of Public Security(2014JSYJA016)Major Science and Technology Innovation Plan of Hubei Province(2013AAA020)the Natural Science Foundation of Hubei Province(2014CFB712)
文摘Classic sparse representation, as one of prevalent feature learning methods, is successfully applied for different computer vision tasks. However it has some intrinsic defects in object detection. Firstly, how to learn a discriminative dictionary for object detection is a hard problem. Secondly, it is usually very time-consuming to learn dictionary based features in a traditional exhaustive search manner like sliding window. In this paper, we propose a novel feature learning framework for object detection with the structure sparsity constraint and classification error minimization constraint to learn a discriminative dictionary. For improving the efficiency, we just learn sparse representation coefficients from object candidate regions and feed them to a kernelized SVM classifier. Experiments on INRIA Person Dataset and Pascal VOC 2007 challenge dataset clearly demonstrate the effectiveness of the proposed approach compared with two state-of-the-art baselines.