Much like humans focus solely on object movement to understand actions,directing a deep learning model’s attention to the core contexts within videos is crucial for improving video comprehension.In the recent study,V...Much like humans focus solely on object movement to understand actions,directing a deep learning model’s attention to the core contexts within videos is crucial for improving video comprehension.In the recent study,Video Masked Auto-Encoder(VideoMAE)employs a pre-training approach with a high ratio of tube masking and reconstruction,effectively mitigating spatial bias due to temporal redundancy in full video frames.This steers the model’s focus toward detailed temporal contexts.However,as the VideoMAE still relies on full video frames during the action recognition stage,it may exhibit a progressive shift in attention towards spatial contexts,deteriorating its ability to capture the main spatio-temporal contexts.To address this issue,we propose an attention-directing module named Transformer Encoder Attention Module(TEAM).This proposed module effectively directs the model’s attention to the core characteristics within each video,inherently mitigating spatial bias.The TEAM first figures out the core features among the overall extracted features from each video.After that,it discerns the specific parts of the video where those features are located,encouraging the model to focus more on these informative parts.Consequently,during the action recognition stage,the proposed TEAM effectively shifts the VideoMAE’s attention from spatial contexts towards the core spatio-temporal contexts.This attention-shift manner alleviates the spatial bias in the model and simultaneously enhances its ability to capture precise video contexts.We conduct extensive experiments to explore the optimal configuration that enables the TEAM to fulfill its intended design purpose and facilitates its seamless integration with the VideoMAE framework.The integrated model,i.e.,VideoMAE+TEAM,outperforms the existing VideoMAE by a significant margin on Something-Something-V2(71.3%vs.70.3%).Moreover,the qualitative comparisons demonstrate that the TEAM encourages the model to disregard insignificant features and focus more on the essential video features,capturing more detailed spatio-temporal contexts within the video.展开更多
Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and sports.Analyzing an action on a practical level requires tracking multiple human bodie...Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and sports.Analyzing an action on a practical level requires tracking multiple human bodies in an image in real-time and simultaneously classifying their actions.There are various related studies on the real-time classification of actions in an image.However,existing deep learning-based action classification models have prolonged response speeds,so there is a limit to real-time analysis.In addition,it has low accuracy of action of each object ifmultiple objects appear in the image.Also,it needs to be improved since it has a memory overhead in processing image data.Deep learning-based action classification using one-shot object detection is proposed to overcome the limitations of multiframe-based analysis technology.The proposed method uses a one-shot object detection model and a multi-object tracking algorithm to detect and track multiple objects in the image.Then,a deep learning-based pattern classification model is used to classify the body action of the object in the image by reducing the data for each object to an action vector.Compared to the existing studies,the constructed model shows higher accuracy of 74.95%,and in terms of speed,it offered better performance than the current studies at 0.234 s per frame.The proposed model makes it possible to classify some actions only through action vector learning without additional image learning because of the vector learning feature of the posterior neural network.Therefore,it is expected to contribute significantly to commercializing realistic streaming data analysis technologies,such as CCTV.展开更多
Current studies have shown that the spatial-temporal graph convolutional network(STGCN)is effective for skeleton-based action recognition.However,for the existing STGCN-based methods,their temporal kernel size is usua...Current studies have shown that the spatial-temporal graph convolutional network(STGCN)is effective for skeleton-based action recognition.However,for the existing STGCN-based methods,their temporal kernel size is usually fixed over all layers,which makes them cannot fully exploit the temporal dependency between discontinuous frames and different sequence lengths.Besides,most of these methods use average pooling to obtain global graph feature from vertex features,resulting in losing much fine-grained information for action classification.To address these issues,in this work,the authors propose a novel spatial attentive and temporal dilated graph convolutional network(SATD-GCN).It contains two important components,that is,a spatial attention pooling module(SAP)and a temporal dilated graph convolution module(TDGC).Specifically,the SAP module can select the human body joints which are beneficial for action recognition by a self-attention mechanism and alleviates the influence of data redundancy and noise.The TDGC module can effectively extract the temporal features at different time scales,which is useful to improve the temporal perception field and enhance the robustness of the model to different motion speed and sequence length.Importantly,both the SAP module and the TDGC module can be easily integrated into the ST-GCN-based models,and significantly improve their performance.Extensive experiments on two large-scale benchmark datasets,that is,NTU-RGB+D and Kinetics-Skeleton,demonstrate that the authors’method achieves the state-of-the-art performance for skeleton-based action recognition.展开更多
Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audie...Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audiences and improve the likelihood of response. In this work we have investigated two data mining techniques: the Naive Bayes and the C4.5 decision tree algorithms. The goal of this work is to predict whether a client will subscribe a term deposit. We also made comparative study of performance of those two algorithms. Publicly available UCI data is used to train and test the performance of the algorithms. Besides, we extract actionable knowledge from decision tree that focuses to take interesting and important decision in business area.展开更多
Real-time video surveillance system is commonly employed to aid security professionals in preventing crimes.The use of deep learning(DL)technologies has transformed real-time video surveillance into smart video survei...Real-time video surveillance system is commonly employed to aid security professionals in preventing crimes.The use of deep learning(DL)technologies has transformed real-time video surveillance into smart video surveillance systems that automate human behavior classification.The recognition of events in the surveillance videos is considered a hot research topic in the field of computer science and it is gaining significant attention.Human action recognition(HAR)is treated as a crucial issue in several applications areas and smart video surveillance to improve the security level.The advancements of the DL models help to accomplish improved recognition performance.In this view,this paper presents a smart deep-based human behavior classification(SDL-HBC)model for real-time video surveillance.The proposed SDL-HBC model majorly aims to employ an adaptive median filtering(AMF)based pre-processing to reduce the noise content.Also,the capsule network(CapsNet)model is utilized for the extraction of feature vectors and the hyperparameter tuning of the CapsNet model takes place utilizing the Adam optimizer.Finally,the differential evolution(DE)with stacked autoencoder(SAE)model is applied for the classification of human activities in the intelligent video surveillance system.The performance validation of the SDL-HBC technique takes place using two benchmark datasets such as the KTH dataset.The experimental outcomes reported the enhanced recognition performance of the SDL-HBC technique over the recent state of art approaches with maximum accuracy of 0.9922.展开更多
Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vis...Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution.展开更多
Human action recognition(HAR)is an essential but challenging task for observing human movements.This problem encompasses the observations of variations in human movement and activity identification by machine learning...Human action recognition(HAR)is an essential but challenging task for observing human movements.This problem encompasses the observations of variations in human movement and activity identification by machine learning algorithms.This article addresses the challenges in activity recognition by implementing and experimenting an intelligent segmentation,features reduction and selection framework.A novel approach has been introduced for the fusion of segmented frames and multi-level features of interests are extracted.An entropy-skewness based features reduction technique has been implemented and the reduced features are converted into a codebook by serial based fusion.A custom made genetic algorithm is implemented on the constructed features codebook in order to select the strong and wellknown features.The features are exploited by a multi-class SVM for action identification.Comprehensive experimental results are undertaken on four action datasets,namely,Weizmann,KTH,Muhavi,and WVU multi-view.We achieved the recognition rate of 96.80%,100%,100%,and 100%respectively.Analysis reveals that the proposed action recognition approach is efficient and well accurate as compare to existing approaches.展开更多
This paper proposes an efficient and simple method for identity recognition in uncontrolled videos. The idea is to use images collected from the web to learn representations of actions related with identity, use this ...This paper proposes an efficient and simple method for identity recognition in uncontrolled videos. The idea is to use images collected from the web to learn representations of actions related with identity, use this knowledge to automatically annotate identity in videos. Our approach is unsupervised where it can identify the identity of human in the video like YouTube directly through the knowledge of his actions. Its benefits are two-fold: 1) we can improve retrieval of identity images, and 2) we can collect a database of action poses related with identity, which can then be used in tagging videos. We present the simple experimental evidence that using action images related with identity collected from the web, annotating identity is possible.展开更多
Human action recognition has gained popularity because of its worldwide applications such as video surveillance, video retrieval and human– computer interaction. This paper provides a comprehensive overview of notabl...Human action recognition has gained popularity because of its worldwide applications such as video surveillance, video retrieval and human– computer interaction. This paper provides a comprehensive overview of notable advances made by deep neural networks in this field. Firstly, the basic conception of action recognition and its common applications were introduced. Secondly, action recognition was categorized as action classification and action detection according to its respective research goals. And various deep learning frameworks for recognition tasks were discussed in detail and the most challenging datasets and taxonomies were briefly reviewed. Finally, the limitations of the state-of-the-art and promising directions of the research were briefly outlined.展开更多
基金This work was supported by the National Research Foundation of Korea(NRF)Grant(Nos.2018R1A5A7059549,2020R1A2C1014037)by Institute of Information&Communications Technology Planning&Evaluation(IITP)Grant(No.2020-0-01373)funded by the Korea government(*MSIT).*Ministry of Science and Information&Communication Technology.
文摘Much like humans focus solely on object movement to understand actions,directing a deep learning model’s attention to the core contexts within videos is crucial for improving video comprehension.In the recent study,Video Masked Auto-Encoder(VideoMAE)employs a pre-training approach with a high ratio of tube masking and reconstruction,effectively mitigating spatial bias due to temporal redundancy in full video frames.This steers the model’s focus toward detailed temporal contexts.However,as the VideoMAE still relies on full video frames during the action recognition stage,it may exhibit a progressive shift in attention towards spatial contexts,deteriorating its ability to capture the main spatio-temporal contexts.To address this issue,we propose an attention-directing module named Transformer Encoder Attention Module(TEAM).This proposed module effectively directs the model’s attention to the core characteristics within each video,inherently mitigating spatial bias.The TEAM first figures out the core features among the overall extracted features from each video.After that,it discerns the specific parts of the video where those features are located,encouraging the model to focus more on these informative parts.Consequently,during the action recognition stage,the proposed TEAM effectively shifts the VideoMAE’s attention from spatial contexts towards the core spatio-temporal contexts.This attention-shift manner alleviates the spatial bias in the model and simultaneously enhances its ability to capture precise video contexts.We conduct extensive experiments to explore the optimal configuration that enables the TEAM to fulfill its intended design purpose and facilitates its seamless integration with the VideoMAE framework.The integrated model,i.e.,VideoMAE+TEAM,outperforms the existing VideoMAE by a significant margin on Something-Something-V2(71.3%vs.70.3%).Moreover,the qualitative comparisons demonstrate that the TEAM encourages the model to disregard insignificant features and focus more on the essential video features,capturing more detailed spatio-temporal contexts within the video.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.NRF-2022R1I1A1A01069526).
文摘Deep learning-based action classification technology has been applied to various fields,such as social safety,medical services,and sports.Analyzing an action on a practical level requires tracking multiple human bodies in an image in real-time and simultaneously classifying their actions.There are various related studies on the real-time classification of actions in an image.However,existing deep learning-based action classification models have prolonged response speeds,so there is a limit to real-time analysis.In addition,it has low accuracy of action of each object ifmultiple objects appear in the image.Also,it needs to be improved since it has a memory overhead in processing image data.Deep learning-based action classification using one-shot object detection is proposed to overcome the limitations of multiframe-based analysis technology.The proposed method uses a one-shot object detection model and a multi-object tracking algorithm to detect and track multiple objects in the image.Then,a deep learning-based pattern classification model is used to classify the body action of the object in the image by reducing the data for each object to an action vector.Compared to the existing studies,the constructed model shows higher accuracy of 74.95%,and in terms of speed,it offered better performance than the current studies at 0.234 s per frame.The proposed model makes it possible to classify some actions only through action vector learning without additional image learning because of the vector learning feature of the posterior neural network.Therefore,it is expected to contribute significantly to commercializing realistic streaming data analysis technologies,such as CCTV.
基金National Key Research and Development Program of China,Grant/Award Number:2018YFB1600600。
文摘Current studies have shown that the spatial-temporal graph convolutional network(STGCN)is effective for skeleton-based action recognition.However,for the existing STGCN-based methods,their temporal kernel size is usually fixed over all layers,which makes them cannot fully exploit the temporal dependency between discontinuous frames and different sequence lengths.Besides,most of these methods use average pooling to obtain global graph feature from vertex features,resulting in losing much fine-grained information for action classification.To address these issues,in this work,the authors propose a novel spatial attentive and temporal dilated graph convolutional network(SATD-GCN).It contains two important components,that is,a spatial attention pooling module(SAP)and a temporal dilated graph convolution module(TDGC).Specifically,the SAP module can select the human body joints which are beneficial for action recognition by a self-attention mechanism and alleviates the influence of data redundancy and noise.The TDGC module can effectively extract the temporal features at different time scales,which is useful to improve the temporal perception field and enhance the robustness of the model to different motion speed and sequence length.Importantly,both the SAP module and the TDGC module can be easily integrated into the ST-GCN-based models,and significantly improve their performance.Extensive experiments on two large-scale benchmark datasets,that is,NTU-RGB+D and Kinetics-Skeleton,demonstrate that the authors’method achieves the state-of-the-art performance for skeleton-based action recognition.
文摘Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audiences and improve the likelihood of response. In this work we have investigated two data mining techniques: the Naive Bayes and the C4.5 decision tree algorithms. The goal of this work is to predict whether a client will subscribe a term deposit. We also made comparative study of performance of those two algorithms. Publicly available UCI data is used to train and test the performance of the algorithms. Besides, we extract actionable knowledge from decision tree that focuses to take interesting and important decision in business area.
文摘Real-time video surveillance system is commonly employed to aid security professionals in preventing crimes.The use of deep learning(DL)technologies has transformed real-time video surveillance into smart video surveillance systems that automate human behavior classification.The recognition of events in the surveillance videos is considered a hot research topic in the field of computer science and it is gaining significant attention.Human action recognition(HAR)is treated as a crucial issue in several applications areas and smart video surveillance to improve the security level.The advancements of the DL models help to accomplish improved recognition performance.In this view,this paper presents a smart deep-based human behavior classification(SDL-HBC)model for real-time video surveillance.The proposed SDL-HBC model majorly aims to employ an adaptive median filtering(AMF)based pre-processing to reduce the noise content.Also,the capsule network(CapsNet)model is utilized for the extraction of feature vectors and the hyperparameter tuning of the CapsNet model takes place utilizing the Adam optimizer.Finally,the differential evolution(DE)with stacked autoencoder(SAE)model is applied for the classification of human activities in the intelligent video surveillance system.The performance validation of the SDL-HBC technique takes place using two benchmark datasets such as the KTH dataset.The experimental outcomes reported the enhanced recognition performance of the SDL-HBC technique over the recent state of art approaches with maximum accuracy of 0.9922.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Human Action Recognition(HAR)is an active research topic in machine learning for the last few decades.Visual surveillance,robotics,and pedestrian detection are the main applications for action recognition.Computer vision researchers have introduced many HAR techniques,but they still face challenges such as redundant features and the cost of computing.In this article,we proposed a new method for the use of deep learning for HAR.In the proposed method,video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning.The Resnet-50 Pre-Trained Model is used as a deep learning model in this work.Features are extracted from two layers:Global Average Pool(GAP)and Fully Connected(FC).The features of both layers are fused by the Canonical Correlation Analysis(CCA).Then features are selected using the Shanon Entropy-based threshold function.The selected features are finally passed to multiple classifiers for final classification.Experiments are conducted on five publicly available datasets as IXMAS,UCF Sports,YouTube,UT-Interaction,and KTH.The accuracy of these data sets was 89.6%,99.7%,100%,96.7%and 96.6%,respectively.Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR.Also,the proposed method is computationally fast based on the time of execution.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘Human action recognition(HAR)is an essential but challenging task for observing human movements.This problem encompasses the observations of variations in human movement and activity identification by machine learning algorithms.This article addresses the challenges in activity recognition by implementing and experimenting an intelligent segmentation,features reduction and selection framework.A novel approach has been introduced for the fusion of segmented frames and multi-level features of interests are extracted.An entropy-skewness based features reduction technique has been implemented and the reduced features are converted into a codebook by serial based fusion.A custom made genetic algorithm is implemented on the constructed features codebook in order to select the strong and wellknown features.The features are exploited by a multi-class SVM for action identification.Comprehensive experimental results are undertaken on four action datasets,namely,Weizmann,KTH,Muhavi,and WVU multi-view.We achieved the recognition rate of 96.80%,100%,100%,and 100%respectively.Analysis reveals that the proposed action recognition approach is efficient and well accurate as compare to existing approaches.
文摘This paper proposes an efficient and simple method for identity recognition in uncontrolled videos. The idea is to use images collected from the web to learn representations of actions related with identity, use this knowledge to automatically annotate identity in videos. Our approach is unsupervised where it can identify the identity of human in the video like YouTube directly through the knowledge of his actions. Its benefits are two-fold: 1) we can improve retrieval of identity images, and 2) we can collect a database of action poses related with identity, which can then be used in tagging videos. We present the simple experimental evidence that using action images related with identity collected from the web, annotating identity is possible.
基金the National Science Foundation of China (Grant No. 61702350).
文摘Human action recognition has gained popularity because of its worldwide applications such as video surveillance, video retrieval and human– computer interaction. This paper provides a comprehensive overview of notable advances made by deep neural networks in this field. Firstly, the basic conception of action recognition and its common applications were introduced. Secondly, action recognition was categorized as action classification and action detection according to its respective research goals. And various deep learning frameworks for recognition tasks were discussed in detail and the most challenging datasets and taxonomies were briefly reviewed. Finally, the limitations of the state-of-the-art and promising directions of the research were briefly outlined.