Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract ...Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality.展开更多
Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human err...Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human errors and makes these systems less effective.Several approaches have been proposed using trajectory-based,non-object-centric,and deep-learning-based methods.Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods.However,the their performance must be improved.This study explores the state-of-the-art deep learning architecture of convolutional neural networks(CNNs)and inception V4 to detect and recognize violence using video data.In the proposed framework,the keyframe extraction technique eliminates duplicate consecutive frames.This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames.For feature selection and classification tasks,the applied sequential CNN uses one kernel size,whereas the inception v4 CNN uses multiple kernels for different layers of the architecture.For empirical analysis,four widely used standard datasets are used with diverse activities.The results confirm that the proposed approach attains 98%accuracy,reduces the computational cost,and outperforms the existing techniques of violence detection and recognition.展开更多
基金The authors would like to thank Research Supporting Project Number(RSP2024R444)King Saud University,Riyadh,Saudi Arabia.
文摘Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2018R1D1A1B07042967)the Soonchunhyang University Research Fund.
文摘Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human errors and makes these systems less effective.Several approaches have been proposed using trajectory-based,non-object-centric,and deep-learning-based methods.Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods.However,the their performance must be improved.This study explores the state-of-the-art deep learning architecture of convolutional neural networks(CNNs)and inception V4 to detect and recognize violence using video data.In the proposed framework,the keyframe extraction technique eliminates duplicate consecutive frames.This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames.For feature selection and classification tasks,the applied sequential CNN uses one kernel size,whereas the inception v4 CNN uses multiple kernels for different layers of the architecture.For empirical analysis,four widely used standard datasets are used with diverse activities.The results confirm that the proposed approach attains 98%accuracy,reduces the computational cost,and outperforms the existing techniques of violence detection and recognition.