Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract ...Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality.展开更多
Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human err...Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human errors and makes these systems less effective.Several approaches have been proposed using trajectory-based,non-object-centric,and deep-learning-based methods.Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods.However,the their performance must be improved.This study explores the state-of-the-art deep learning architecture of convolutional neural networks(CNNs)and inception V4 to detect and recognize violence using video data.In the proposed framework,the keyframe extraction technique eliminates duplicate consecutive frames.This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames.For feature selection and classification tasks,the applied sequential CNN uses one kernel size,whereas the inception v4 CNN uses multiple kernels for different layers of the architecture.For empirical analysis,four widely used standard datasets are used with diverse activities.The results confirm that the proposed approach attains 98%accuracy,reduces the computational cost,and outperforms the existing techniques of violence detection and recognition.展开更多
Human activity detection and recognition is a challenging task.Video surveillance can benefit greatly by advances in Internet of Things(IoT)and cloud computing.Artificial intelligence IoT(AIoT)based devices form the b...Human activity detection and recognition is a challenging task.Video surveillance can benefit greatly by advances in Internet of Things(IoT)and cloud computing.Artificial intelligence IoT(AIoT)based devices form the basis of a smart city.The research presents Intelligent dynamic gesture recognition(IDGR)using a Convolutional neural network(CNN)empowered by edit distance for video recognition.The proposed system has been evaluated using AIoT enabled devices for static and dynamic gestures of Pakistani sign language(PSL).However,the proposed methodology can work efficiently for any type of video.The proposed research concludes that deep learning and convolutional neural networks give a most appropriate solution retaining discriminative and dynamic information of the input action.The research proposes recognition of dynamic gestures using image recognition of the keyframes based on CNN extracted from the human activity.Edit distance is used to find out the label of the word to which those sets of frames belong to.The simulation results have shown that at 400 videos per human action,100 epochs,234×234 image size,the accuracy of the system is 90.79%,which is a reasonable accuracy for a relatively small dataset as compared to the previously published techniques.展开更多
A new technique using fuzzy in a recursive fashion is presented to deal with the Gaussian noise. In this technique, the keyframes and between frames are identified initially and the keyframe is denoised efficiently. T...A new technique using fuzzy in a recursive fashion is presented to deal with the Gaussian noise. In this technique, the keyframes and between frames are identified initially and the keyframe is denoised efficiently. This frame is compared with the between frames to remove noise. To do so the frames are partitioned into blocks;the motion vector is calculated;also the difference is measured using the dissimilarity function. If the blocks have no motion vectors in the block, the block of value is copied to the between frames otherwise the difference between the blocks is calculated and this value is filtered with temporal filtering. The blocks are processed in overlapping manner to avoid the blocking effect and also to reduce the additional edges created while processing. The simulation results show that the peak signal to noise ratio of the new technique is improved up to 1 dB and also the execution time is greatly reduced.展开更多
Pomo video recognition is important for Intemet content monitoring. In this paper, a novel pomo video recognition method by fusing the audio and video cues is proposed. Firstly, global color and texture features and l...Pomo video recognition is important for Intemet content monitoring. In this paper, a novel pomo video recognition method by fusing the audio and video cues is proposed. Firstly, global color and texture features and local scale-invariant feature transform (SIFT) are extracted to train multiple support vector machine (SVM) classifiers for different erotic categories of image frames. And then, two continuous density hidden Markov models (CHMM) are built to recognize porno sounds. Finally, a fusion method based on Bayes rule is employed to combine the classification results by video and audio cues. The experimental results show that our model is better than six state-of-the-art methods.展开更多
Motion capture is increasingly used in games and movies, but often requires editing before it can be used, for many reasons. The motion may need to be adjusted to correctly interact with virtual objects or to fix prob...Motion capture is increasingly used in games and movies, but often requires editing before it can be used, for many reasons. The motion may need to be adjusted to correctly interact with virtual objects or to fix problems that result from mapping the motion to a character of a different size or, beyond such technical requirements, directors can request stylistic changes.Unfortunately, editing is laborious because of the lowlevel representation of the data. While existing motion editing methods accomplish modest changes, larger edits can require the artist to "re-animate" the motion by manually selecting a subset of the frames as keyframes.In this paper, we automatically find sets of frames to serve as keyframes for editing the motion. We formulate the problem of selecting an optimal set of keyframes as a shortest-path problem, and solve it efficiently using dynamic programming. We create a new simplified animation by interpolating the found keyframes using a naive curve fitting technique. Our algorithm can simplify motion capture to around 10% of the original number of frames while retaining most of its detail. By simplifying animation with our algorithm, we realize a new approach to motion editing and stylization founded on the timetested keyframe interface. We present results that show our algorithm outperforms both research algorithms and a leading commercial tool.展开更多
The massive web videos prompt an imperative demand on efficiently grasping the major events. However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and...The massive web videos prompt an imperative demand on efficiently grasping the major events. However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and the unavoidable error in near-duplicate keyframes (NDKs) detection, make web video event mining a challenging task. In this paper, we propose a novel four-stage framework to improve the performance of web video event mining. Data preprocessing is the first stage. Multiple Correspondence Analysis (MCA) is then applied to explore the correlation between terms and classes, targeting for bridging the gap between NDKs and high-level semantic concepts. Next, co-occurrence information is used to detect the similarity between NDKs and classes using the NDK-within-video information. Finally, both of them are integrated for web video event mining through negative NDK pruning and positive NDK enhancement. Moreover, both NDKs and terms with relatively low frequencies are treated as useful information in our experiments. Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.展开更多
We present a sketch-based rotation editing mations. Given a set of keyframe orientations of a rigid system for enriching rotational motion in keyframe ani- object, the user first edits its angular velocity trajectory ...We present a sketch-based rotation editing mations. Given a set of keyframe orientations of a rigid system for enriching rotational motion in keyframe ani- object, the user first edits its angular velocity trajectory by sketching curves, and then the system computes the altered rotational motion by solving a variational curve fitting problem. The solved rotational motion not only satisfies the orientation constraints at the keyframes, but also fits well the user-specified angular velocity trajectory. Our system is simple and easy to use. We demonstrate its usefulness by adding interesting and realistic rotational details to several keyframe animations.展开更多
基金The authors would like to thank Research Supporting Project Number(RSP2024R444)King Saud University,Riyadh,Saudi Arabia.
文摘Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2018R1D1A1B07042967)the Soonchunhyang University Research Fund.
文摘Violence recognition is crucial because of its applications in activities related to security and law enforcement.Existing semi-automated systems have issues such as tedious manual surveillances,which causes human errors and makes these systems less effective.Several approaches have been proposed using trajectory-based,non-object-centric,and deep-learning-based methods.Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods.However,the their performance must be improved.This study explores the state-of-the-art deep learning architecture of convolutional neural networks(CNNs)and inception V4 to detect and recognize violence using video data.In the proposed framework,the keyframe extraction technique eliminates duplicate consecutive frames.This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames.For feature selection and classification tasks,the applied sequential CNN uses one kernel size,whereas the inception v4 CNN uses multiple kernels for different layers of the architecture.For empirical analysis,four widely used standard datasets are used with diverse activities.The results confirm that the proposed approach attains 98%accuracy,reduces the computational cost,and outperforms the existing techniques of violence detection and recognition.
文摘Human activity detection and recognition is a challenging task.Video surveillance can benefit greatly by advances in Internet of Things(IoT)and cloud computing.Artificial intelligence IoT(AIoT)based devices form the basis of a smart city.The research presents Intelligent dynamic gesture recognition(IDGR)using a Convolutional neural network(CNN)empowered by edit distance for video recognition.The proposed system has been evaluated using AIoT enabled devices for static and dynamic gestures of Pakistani sign language(PSL).However,the proposed methodology can work efficiently for any type of video.The proposed research concludes that deep learning and convolutional neural networks give a most appropriate solution retaining discriminative and dynamic information of the input action.The research proposes recognition of dynamic gestures using image recognition of the keyframes based on CNN extracted from the human activity.Edit distance is used to find out the label of the word to which those sets of frames belong to.The simulation results have shown that at 400 videos per human action,100 epochs,234×234 image size,the accuracy of the system is 90.79%,which is a reasonable accuracy for a relatively small dataset as compared to the previously published techniques.
文摘A new technique using fuzzy in a recursive fashion is presented to deal with the Gaussian noise. In this technique, the keyframes and between frames are identified initially and the keyframe is denoised efficiently. This frame is compared with the between frames to remove noise. To do so the frames are partitioned into blocks;the motion vector is calculated;also the difference is measured using the dissimilarity function. If the blocks have no motion vectors in the block, the block of value is copied to the between frames otherwise the difference between the blocks is calculated and this value is filtered with temporal filtering. The blocks are processed in overlapping manner to avoid the blocking effect and also to reduce the additional edges created while processing. The simulation results show that the peak signal to noise ratio of the new technique is improved up to 1 dB and also the execution time is greatly reduced.
基金supported by the National Natural Science Foundation of China (90920001, 61101212)the Fundamental Research Funds for the Central Universities
文摘Pomo video recognition is important for Intemet content monitoring. In this paper, a novel pomo video recognition method by fusing the audio and video cues is proposed. Firstly, global color and texture features and local scale-invariant feature transform (SIFT) are extracted to train multiple support vector machine (SVM) classifiers for different erotic categories of image frames. And then, two continuous density hidden Markov models (CHMM) are built to recognize porno sounds. Finally, a fusion method based on Bayes rule is employed to combine the classification results by video and audio cues. The experimental results show that our model is better than six state-of-the-art methods.
文摘Motion capture is increasingly used in games and movies, but often requires editing before it can be used, for many reasons. The motion may need to be adjusted to correctly interact with virtual objects or to fix problems that result from mapping the motion to a character of a different size or, beyond such technical requirements, directors can request stylistic changes.Unfortunately, editing is laborious because of the lowlevel representation of the data. While existing motion editing methods accomplish modest changes, larger edits can require the artist to "re-animate" the motion by manually selecting a subset of the frames as keyframes.In this paper, we automatically find sets of frames to serve as keyframes for editing the motion. We formulate the problem of selecting an optimal set of keyframes as a shortest-path problem, and solve it efficiently using dynamic programming. We create a new simplified animation by interpolating the found keyframes using a naive curve fitting technique. Our algorithm can simplify motion capture to around 10% of the original number of frames while retaining most of its detail. By simplifying animation with our algorithm, we realize a new approach to motion editing and stylization founded on the timetested keyframe interface. We present results that show our algorithm outperforms both research algorithms and a leading commercial tool.
基金supported by the National Natural Science Foundation of China under Grant Nos. 61373121, 61071184, 60972111,61036008the Research Funds for the Doctoral Program of Higher Education of China under Grant No. 20100184120009+2 种基金the Program for Sichuan Provincial Science Fund for Distinguished Young Scholars under Grant Nos. 2012JQ0029, 13QNJJ0149the Fundamental Research Funds for the Central Universities of China under Grant Nos. SWJTU09CX032, SWJTU10CX08the Program of China Scholarships Council under Grant No. 201207000050
文摘The massive web videos prompt an imperative demand on efficiently grasping the major events. However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and the unavoidable error in near-duplicate keyframes (NDKs) detection, make web video event mining a challenging task. In this paper, we propose a novel four-stage framework to improve the performance of web video event mining. Data preprocessing is the first stage. Multiple Correspondence Analysis (MCA) is then applied to explore the correlation between terms and classes, targeting for bridging the gap between NDKs and high-level semantic concepts. Next, co-occurrence information is used to detect the similarity between NDKs and classes using the NDK-within-video information. Finally, both of them are integrated for web video event mining through negative NDK pruning and positive NDK enhancement. Moreover, both NDKs and terms with relatively low frequencies are treated as useful information in our experiments. Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.
基金supported by the National Natural Science Foundation of China (No. 61003145)the Fundamental Research Fundsfor the Central Universities, China (No. 2009QNA5018)
文摘We present a sketch-based rotation editing mations. Given a set of keyframe orientations of a rigid system for enriching rotational motion in keyframe ani- object, the user first edits its angular velocity trajectory by sketching curves, and then the system computes the altered rotational motion by solving a variational curve fitting problem. The solved rotational motion not only satisfies the orientation constraints at the keyframes, but also fits well the user-specified angular velocity trajectory. Our system is simple and easy to use. We demonstrate its usefulness by adding interesting and realistic rotational details to several keyframe animations.