In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the e...In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach.展开更多
In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision lev...In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.展开更多
This paper proposes a framework for human action recognition based on procrustes analysis and Fisher vector coding(FVC).Firstly,we applied a pose feature extracted from silhouette image by employing Procrustes analysi...This paper proposes a framework for human action recognition based on procrustes analysis and Fisher vector coding(FVC).Firstly,we applied a pose feature extracted from silhouette image by employing Procrustes analysis and local preserving projection(LPP).Secondly,the extracted feature can preserve the discriminative shape information and local manifold structure of human pose and is invariant to translation,rotation and scaling.Finally,after the pose feature was extracted,a recognition framework based on FVC and multi-class supporting vector machine was employed to classify the human action.Experimental results on benchmarks demonstrate the effectiveness of the proposed method.展开更多
This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion b...This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion between human and objects during the interacting process.Since that human actions and interacted objects provide strong context information,i.e.some actions are usually related to some specific objects,the accuracy of recognition is significantly improved for both of them.Through the proposed method,both global and local temporal features from skeleton sequences are extracted to model human actions.In the meantime,kernel features are utilized to describe interacted objects.Finally,all possible solutions from actions and objects are optimized by modeling the context between them.The results of experiments demonstrate the effectiveness of our method.展开更多
Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate....Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate. A new recognition method of the human action is proposed with the multi-scale directed depth motion maps(MsdDMMs) and Log-Gabor filters. According to the difference between the speed and time order of an action, MsdDMMs is proposed under the energy framework. Meanwhile, Log-Gabor is utilized to describe the texture details of MsdDMMs for the motion characteristics. It can easily satisfy both the texture characterization and the visual features of human eye. Furthermore, the collaborative representation is employed as action recognition by the classification. Experimental results show that the proposed algorithm, which is applied in the MSRAction3 D dataset and MSRGesture3 D dataset, can achieve the accuracy of 95.79% and 96.43% respectively. It also has higher accuracy than the existing algorithms, such as super normal vector(SNV), hierarchical recurrent neural network(Hierarchical RNN).展开更多
The development of artificial intelligence(AI)and smart home technologies has driven the need for speech recognition-based solutions.This demand stems from the quest for more intuitive and natural interaction between ...The development of artificial intelligence(AI)and smart home technologies has driven the need for speech recognition-based solutions.This demand stems from the quest for more intuitive and natural interaction between users and smart devices in their homes.Speech recognition allows users to control devices and perform everyday actions through spoken commands,eliminating the need for physical interfaces or touch screens and enabling specific tasks such as turning on or off the light,heating,or lowering the blinds.The purpose of this study is to develop a speech-based classification model for recognizing human actions in the smart home.It seeks to demonstrate the effectiveness and feasibility of using machine learning techniques in predicting categories,subcategories,and actions from sentences.A dataset labeled with relevant information about categories,subcategories,and actions related to human actions in the smart home is used.The methodology uses machine learning techniques implemented in Python,extracting features using CountVectorizer to convert sentences into numerical representations.The results show that the classification model is able to accurately predict categories,subcategories,and actions based on sentences,with 82.99%accuracy for category,76.19%accuracy for subcategory,and 90.28%accuracy for action.The study concludes that using machine learning techniques is effective for recognizing and classifying human actions in the smart home,supporting its feasibility in various scenarios and opening new possibilities for advanced natural language processing systems in the field of AI and smart homes.展开更多
Human action recognition(HAR)based on Artificial intelligence reasoning is the most important research area in computer vision.Big breakthroughs in this field have been observed in the last few years;additionally,the ...Human action recognition(HAR)based on Artificial intelligence reasoning is the most important research area in computer vision.Big breakthroughs in this field have been observed in the last few years;additionally,the interest in research in this field is evolving,such as understanding of actions and scenes,studying human joints,and human posture recognition.Many HAR techniques are introduced in the literature.Nonetheless,the challenge of redundant and irrelevant features reduces recognition accuracy.They also faced a few other challenges,such as differing perspectives,environmental conditions,and temporal variations,among others.In this work,a deep learning and improved whale optimization algorithm based framework is proposed for HAR.The proposed framework consists of a few core stages i.e.,frames initial preprocessing,fine-tuned pre-trained deep learning models through transfer learning(TL),features fusion using modified serial based approach,and improved whale optimization based best features selection for final classification.Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets.The fusion process increases the length of feature vectors;therefore,improved whale optimization algorithm is proposed and selects the best features.The best selected features are finally classified usingmachine learning(ML)classifiers.Four publicly accessible datasets such as Ut-interaction,Hollywood,Free Viewpoint Action Recognition usingMotion History Volumes(IXMAS),and centre of computer vision(UCF)Sports,are employed and achieved the testing accuracy of 100%,99.9%,99.1%,and 100%respectively.Comparison with state of the art techniques(SOTA),the proposed method showed the improved accuracy.展开更多
Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and auto...Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and automated analysis of video information is required.However,various issues such as data size limitations and low processing speeds make real-time extraction of video data challenging.Video analysis technology applies object classification,detection,and relationship analysis to continuous 2D frame data,and the various meanings within the video are thus analyzed based on the extracted basic data.Motion recognition is key in this analysis.Motion recognition is a challenging field that analyzes human body movements,requiring the interpretation of complex movements of human joints and the relationships between various objects.The deep learning-based human skeleton detection algorithm is a representative motion recognition algorithm.Recently,motion analysis models such as the SlowFast network algorithm,have also been developed with excellent performance.However,these models do not operate properly in most wide-angle video environments outdoors,displaying low response speed,as expected from motion classification extraction in environments associated with high-resolution images.The proposed method achieves high level of extraction and accuracy by improving SlowFast’s input data preprocessing and data structure methods.The input data are preprocessed through object tracking and background removal using YOLO and DeepSORT.A higher performance than that of a single model is achieved by improving the existing SlowFast’s data structure into a frame unit structure.Based on the confusion matrix,accuracies of 70.16%and 70.74%were obtained for the existing SlowFast and proposed model,respectively,indicating a 0.58%increase in accuracy.Comparing detection,based on behavioral classification,the existing SlowFast detected 2,341,164 cases,whereas the proposed model detected 3,119,323 cases,which is an increase of 33.23%.展开更多
Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras avai...Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras available in various devices that can be in a static or dynamic position and are referred to as untrimmed videos.Smarter monitoring is a historical necessity in which commonly occurring,regular,and out-of-the-ordinary activities can be automatically identified using intelligence systems and computer vision technology.In a long video,human activity may be present anywhere in the video.There can be a single ormultiple human activities present in such videos.This paper presents a deep learning-based methodology to identify the locally present human activities in the video sequences captured by a single wide-view camera in a sports environment.The recognition process is split into four parts:firstly,the video is divided into different set of frames,then the human body part in a sequence of frames is identified,next process is to identify the human activity using a convolutional neural network and finally the time information of the observed postures for each activity is determined with the help of a deep learning algorithm.The proposed approach has been tested on two different sports datasets including ActivityNet and THUMOS.Three sports activities like swimming,cricket bowling and high jump have been considered in this paper and classified with the temporal information i.e.,the start and end time for every activity present in the video.The convolutional neural network and long short-term memory are used for feature extraction of temporal action recognition from video data of sports activity.The outcomes show that the proposed method for activity recognition in the sports domain outperforms the existing methods.展开更多
Device-free activity recognition plays a crucial role in smart building,security,and human–computer interaction,which shows its strength in its convenience and cost-efficiency.Traditional machine learning has made si...Device-free activity recognition plays a crucial role in smart building,security,and human–computer interaction,which shows its strength in its convenience and cost-efficiency.Traditional machine learning has made significant progress by heuristic hand-crafted features and statistical models,but it suffers from the limitation of manual feature design.Deep learning overcomes such issues by automatic high-level feature extraction,but its performance degrades due to the requirement of massive annotated data and cross-site issues.To deal with these problems,transfer learning helps to transfer knowledge from existing datasets while dealing with the negative effect of background dynamics.This paper surveys the recent progress of deep learning and transfer learning for device-free activity recognition.We begin with the motivation of deep learning and transfer learning,and then introduce the major sensor modalities.Then the deep and transfer learning techniques for device-free human activity recognition are introduced.Eventually,insights on existing works and grand challenges are summarized and presented to promote future research.展开更多
Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables dom...Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables domain predominantly emphasizes sensor functionality and quantity,often skipping crucial aspects related to user experience and interaction.Methods To address this gap,this study introduces a novel real-time 3D interactive system based on intelligent garments.The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulsed neural units to classify and recognize human movements,thereby achieving real-time interaction between users and sensors.Additionally,the system incorporates 3D human visualization functionality,which visualizes sensor data and recognizes human actions as 3D models in real time,providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion.This system has significant potential for applications in motion detection,medical monitoring,virtual reality,and other fields.The accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies.Conclusions This study has substantial implications in the domains of intelligent garments,human motion monitoring,and digital twin visualization.The advancement of this system is expected to propel the progress of wearable technology and foster a deeper comprehension of human motion.展开更多
Medical-action recognition is crucial for ensuring the quality of medical services.With advancements in deep learning,RGB camera-based human-action recognition made huge advancements.However,RGB cameras encounter issu...Medical-action recognition is crucial for ensuring the quality of medical services.With advancements in deep learning,RGB camera-based human-action recognition made huge advancements.However,RGB cameras encounter issues,such as depth ambiguity and privacy violation.In this paper,we propose a novel lidar-based action-recognition algorithm for medical quality control.Further,point-cloud data were used for recognizing hand-washing actions of doctors and recording the action’s duration.An improved anchor-to-joint(A2J)network,with pyramid vision transformer and feature pyramid network modules,was developed for estimating the human poses.In addition,we designed a graph convolution network for action classification based on the skeleton data.Then,we evaluated the performance of the improved A2J network on the open-source ITOP and our medical pose estimation datasets.Further,we tested our medical action-recognition method in actual wards to demonstrate its effectiveness and running efficiency.The results show that the proposed algorithm can effectively recognize the actions of medical staff,providing satisfactory real-time performance and 96.3% action-classification accuracy.展开更多
Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos.They are both active research topics in computer vision community,which have attra...Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos.They are both active research topics in computer vision community,which have attracted considerable attention from academia and industry.They are also the precondition for intelligent interaction and human-computer cooperation,and they help the machine perceive the external environment.In the past decade,tremendous progress has been made in the field,especially after the emergence of deep learning technologies.Hence,it is necessary to make a comprehensive review of recent developments.In this paper,firstly,we attempt to present the background,and then discuss research progresses.Secondly,we introduce datasets,various typical feature representation methods,and explore advanced human action recognition and posture prediction algorithms.Finally,facing the challenges in the field,this paper puts forward the research focus,and introduces the importance of action recognition and posture prediction by taking interactive cognition in self-driving vehicle as an example.展开更多
Human action recognition from skeletal data is an important and active area of research in which the state of the art has not yet achieved near-perfect accuracy on many well- known datasets. In this paper, we introduc...Human action recognition from skeletal data is an important and active area of research in which the state of the art has not yet achieved near-perfect accuracy on many well- known datasets. In this paper, we introduce the Distribution of Action Movements Descriptor, a novel action descriptor based on the distribution of the directions of the motions of the joints between frames, over the set of all possible mo- tions in the dataset. The descriptor is computed as a normal- ized histogram over a set of representative directions of the joints, which are in turn obtained via clustering. While the descriptor is global in the sense that it represents the overall distribution of movement directions of an action, it is able to partially retain its temporal structure by applying a window- ing scheme. The descriptor, together with performs several state-of-the-art known datasets. a standard classifier, out- techniques on many well-展开更多
Human action recognition has become one of the most active research topics in human-computer interaction and artificial intelligence, and has attracted much attention. Here, we employ a low-cost optical sensor Kinect ...Human action recognition has become one of the most active research topics in human-computer interaction and artificial intelligence, and has attracted much attention. Here, we employ a low-cost optical sensor Kinect to capture the action information of the human skeleton. We then propose a two-level hierarchical human action recognition model with self-selection classifiers via skeleton data. Especially different optimal classifiers are selected by probability voting mechanism and 10 times 10-fold cross validation at different coarse grained levels. Extensive simulations on a well-known open dataset and results demonstrate that our proposed method is efficient in human action recognition, achieving 94.19%the average recognition rate and 95.61% the best rate.展开更多
Real-time video surveillance system is commonly employed to aid security professionals in preventing crimes.The use of deep learning(DL)technologies has transformed real-time video surveillance into smart video survei...Real-time video surveillance system is commonly employed to aid security professionals in preventing crimes.The use of deep learning(DL)technologies has transformed real-time video surveillance into smart video surveillance systems that automate human behavior classification.The recognition of events in the surveillance videos is considered a hot research topic in the field of computer science and it is gaining significant attention.Human action recognition(HAR)is treated as a crucial issue in several applications areas and smart video surveillance to improve the security level.The advancements of the DL models help to accomplish improved recognition performance.In this view,this paper presents a smart deep-based human behavior classification(SDL-HBC)model for real-time video surveillance.The proposed SDL-HBC model majorly aims to employ an adaptive median filtering(AMF)based pre-processing to reduce the noise content.Also,the capsule network(CapsNet)model is utilized for the extraction of feature vectors and the hyperparameter tuning of the CapsNet model takes place utilizing the Adam optimizer.Finally,the differential evolution(DE)with stacked autoencoder(SAE)model is applied for the classification of human activities in the intelligent video surveillance system.The performance validation of the SDL-HBC technique takes place using two benchmark datasets such as the KTH dataset.The experimental outcomes reported the enhanced recognition performance of the SDL-HBC technique over the recent state of art approaches with maximum accuracy of 0.9922.展开更多
Human Action Recognition(HAR)attempts to recognize the human action from images and videos.The major challenge in HAR is the design of an action descriptor that makes the HAR system robust for different environments.A...Human Action Recognition(HAR)attempts to recognize the human action from images and videos.The major challenge in HAR is the design of an action descriptor that makes the HAR system robust for different environments.A novel action descriptor is proposed in this study,based on two independent spatial and spectral filters.The proposed descriptor uses a Difference of Gaussian(DoG)filter to extract scale-invariant features and a Difference of Wavelet(DoW)filter to extract spectral information.To create a composite feature vector for a particular test action picture,the Discriminant of Guassian(DoG)and Difference of Wavelet(DoW)features are combined.Linear Discriminant Analysis(LDA),a widely used dimensionality reduction technique,is also used to eliminate duplicate data.Finally,a closest neighbor method is used to classify the dataset.Weizmann and UCF 11 datasets were used to run extensive simulations of the suggested strategy,and the accuracy assessed after the simulations were run on Weizmann datasets for five-fold cross validation is shown to perform well.The average accuracy of DoG+DoW is observed as 83.6635%while the average accuracy of Discrinanat of Guassian(DoG)and Difference of Wavelet(DoW)is observed as 80.2312%and 77.4215%,respectively.The average accuracy measured after the simulation of proposed methods over UCF 11 action dataset for five-fold cross validation DoG+DoW is observed as 62.5231%while the average accuracy of Difference of Guassian(DoG)and Difference of Wavelet(DoW)is observed as 60.3214%and 58.1247%,respectively.From the above accuracy observations,the accuracy of Weizmann is high compared to the accuracy of UCF 11,hence verifying the effectiveness in the improvisation of recognition accuracy.展开更多
Human action recognition based on skeleton information has been extensively used in various areas,such as human-computer interaction.In this paper,we extracted human skeleton data by constructing a two-stage human pos...Human action recognition based on skeleton information has been extensively used in various areas,such as human-computer interaction.In this paper,we extracted human skeleton data by constructing a two-stage human pose estimation model,which combined the improved single shot detector(SSD)algorithm with convolutional pose machines(CPM)to obtain human skeleton heatmaps.The backbone of the SSD algorithm was replaced with ResNet,which can characterize images effectively.In addition,we designed multiscale transformation rules for CPM to fuse the information of different scales and a convolutional neural network for the classification of the skeleton keypoints heatmaps to complete action recognition.Indoor and outdoor experiments were conducted on the Caster Moma mobile robot platform,and without an external remote control,the real-time movement of the robot was controlled by the leader through command actions.展开更多
文摘In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach.
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+2 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the Fundamental Research Funds for the Central Universities No. FRF-TP-14-046A2the first-class discipline construction transitional funds of Minzu University of China
文摘In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.
基金National Natural Science Foundation of China(No.61602148)Natural Science Foundation of Fujian Province,China(No.2016J01040)Xiamen University of Technology High Level Talents Project,China(No.YKJ15018R)
文摘This paper proposes a framework for human action recognition based on procrustes analysis and Fisher vector coding(FVC).Firstly,we applied a pose feature extracted from silhouette image by employing Procrustes analysis and local preserving projection(LPP).Secondly,the extracted feature can preserve the discriminative shape information and local manifold structure of human pose and is invariant to translation,rotation and scaling.Finally,after the pose feature was extracted,a recognition framework based on FVC and multi-class supporting vector machine was employed to classify the human action.Experimental results on benchmarks demonstrate the effectiveness of the proposed method.
文摘This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion between human and objects during the interacting process.Since that human actions and interacted objects provide strong context information,i.e.some actions are usually related to some specific objects,the accuracy of recognition is significantly improved for both of them.Through the proposed method,both global and local temporal features from skeleton sequences are extracted to model human actions.In the meantime,kernel features are utilized to describe interacted objects.Finally,all possible solutions from actions and objects are optimized by modeling the context between them.The results of experiments demonstrate the effectiveness of our method.
基金Sponsored by the Jiangsu Prospective Joint Research Project(Grant No.BY2016022-28)
文摘Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate. A new recognition method of the human action is proposed with the multi-scale directed depth motion maps(MsdDMMs) and Log-Gabor filters. According to the difference between the speed and time order of an action, MsdDMMs is proposed under the energy framework. Meanwhile, Log-Gabor is utilized to describe the texture details of MsdDMMs for the motion characteristics. It can easily satisfy both the texture characterization and the visual features of human eye. Furthermore, the collaborative representation is employed as action recognition by the classification. Experimental results show that the proposed algorithm, which is applied in the MSRAction3 D dataset and MSRGesture3 D dataset, can achieve the accuracy of 95.79% and 96.43% respectively. It also has higher accuracy than the existing algorithms, such as super normal vector(SNV), hierarchical recurrent neural network(Hierarchical RNN).
基金supported by Generalitat Valenciana with HAAS(CIAICO/2021/039)the Spanish Ministry of Science and Innovation under the Project AVANTIA PID2020-114480RB-I00.
文摘The development of artificial intelligence(AI)and smart home technologies has driven the need for speech recognition-based solutions.This demand stems from the quest for more intuitive and natural interaction between users and smart devices in their homes.Speech recognition allows users to control devices and perform everyday actions through spoken commands,eliminating the need for physical interfaces or touch screens and enabling specific tasks such as turning on or off the light,heating,or lowering the blinds.The purpose of this study is to develop a speech-based classification model for recognizing human actions in the smart home.It seeks to demonstrate the effectiveness and feasibility of using machine learning techniques in predicting categories,subcategories,and actions from sentences.A dataset labeled with relevant information about categories,subcategories,and actions related to human actions in the smart home is used.The methodology uses machine learning techniques implemented in Python,extracting features using CountVectorizer to convert sentences into numerical representations.The results show that the classification model is able to accurately predict categories,subcategories,and actions based on sentences,with 82.99%accuracy for category,76.19%accuracy for subcategory,and 90.28%accuracy for action.The study concludes that using machine learning techniques is effective for recognizing and classifying human actions in the smart home,supporting its feasibility in various scenarios and opening new possibilities for advanced natural language processing systems in the field of AI and smart homes.
基金This research work is supported in part by Chiang Mai University and HITEC University.
文摘Human action recognition(HAR)based on Artificial intelligence reasoning is the most important research area in computer vision.Big breakthroughs in this field have been observed in the last few years;additionally,the interest in research in this field is evolving,such as understanding of actions and scenes,studying human joints,and human posture recognition.Many HAR techniques are introduced in the literature.Nonetheless,the challenge of redundant and irrelevant features reduces recognition accuracy.They also faced a few other challenges,such as differing perspectives,environmental conditions,and temporal variations,among others.In this work,a deep learning and improved whale optimization algorithm based framework is proposed for HAR.The proposed framework consists of a few core stages i.e.,frames initial preprocessing,fine-tuned pre-trained deep learning models through transfer learning(TL),features fusion using modified serial based approach,and improved whale optimization based best features selection for final classification.Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets.The fusion process increases the length of feature vectors;therefore,improved whale optimization algorithm is proposed and selects the best features.The best selected features are finally classified usingmachine learning(ML)classifiers.Four publicly accessible datasets such as Ut-interaction,Hollywood,Free Viewpoint Action Recognition usingMotion History Volumes(IXMAS),and centre of computer vision(UCF)Sports,are employed and achieved the testing accuracy of 100%,99.9%,99.1%,and 100%respectively.Comparison with state of the art techniques(SOTA),the proposed method showed the improved accuracy.
基金supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1A6A1A03040583)supported by Kyonggi University’s Graduate Research Assistantship 2023.
文摘Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and automated analysis of video information is required.However,various issues such as data size limitations and low processing speeds make real-time extraction of video data challenging.Video analysis technology applies object classification,detection,and relationship analysis to continuous 2D frame data,and the various meanings within the video are thus analyzed based on the extracted basic data.Motion recognition is key in this analysis.Motion recognition is a challenging field that analyzes human body movements,requiring the interpretation of complex movements of human joints and the relationships between various objects.The deep learning-based human skeleton detection algorithm is a representative motion recognition algorithm.Recently,motion analysis models such as the SlowFast network algorithm,have also been developed with excellent performance.However,these models do not operate properly in most wide-angle video environments outdoors,displaying low response speed,as expected from motion classification extraction in environments associated with high-resolution images.The proposed method achieves high level of extraction and accuracy by improving SlowFast’s input data preprocessing and data structure methods.The input data are preprocessed through object tracking and background removal using YOLO and DeepSORT.A higher performance than that of a single model is achieved by improving the existing SlowFast’s data structure into a frame unit structure.Based on the confusion matrix,accuracies of 70.16%and 70.74%were obtained for the existing SlowFast and proposed model,respectively,indicating a 0.58%increase in accuracy.Comparing detection,based on behavioral classification,the existing SlowFast detected 2,341,164 cases,whereas the proposed model detected 3,119,323 cases,which is an increase of 33.23%.
基金This work was supported by the Deanship of Scientific Research at King Khalid University through a General Research Project under Grant Number GRP/41/42.
文摘Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras available in various devices that can be in a static or dynamic position and are referred to as untrimmed videos.Smarter monitoring is a historical necessity in which commonly occurring,regular,and out-of-the-ordinary activities can be automatically identified using intelligence systems and computer vision technology.In a long video,human activity may be present anywhere in the video.There can be a single ormultiple human activities present in such videos.This paper presents a deep learning-based methodology to identify the locally present human activities in the video sequences captured by a single wide-view camera in a sports environment.The recognition process is split into four parts:firstly,the video is divided into different set of frames,then the human body part in a sequence of frames is identified,next process is to identify the human activity using a convolutional neural network and finally the time information of the observed postures for each activity is determined with the help of a deep learning algorithm.The proposed approach has been tested on two different sports datasets including ActivityNet and THUMOS.Three sports activities like swimming,cricket bowling and high jump have been considered in this paper and classified with the temporal information i.e.,the start and end time for every activity present in the video.The convolutional neural network and long short-term memory are used for feature extraction of temporal action recognition from video data of sports activity.The outcomes show that the proposed method for activity recognition in the sports domain outperforms the existing methods.
基金This work is supported by NTU Presidential Postdoctoral Fellowship,"Adaptive Multimodal Learning for Robust Sensing and Recognition in Smart Cities"project fund,in Nanyang Technological University,Singapore.
文摘Device-free activity recognition plays a crucial role in smart building,security,and human–computer interaction,which shows its strength in its convenience and cost-efficiency.Traditional machine learning has made significant progress by heuristic hand-crafted features and statistical models,but it suffers from the limitation of manual feature design.Deep learning overcomes such issues by automatic high-level feature extraction,but its performance degrades due to the requirement of massive annotated data and cross-site issues.To deal with these problems,transfer learning helps to transfer knowledge from existing datasets while dealing with the negative effect of background dynamics.This paper surveys the recent progress of deep learning and transfer learning for device-free activity recognition.We begin with the motivation of deep learning and transfer learning,and then introduce the major sensor modalities.Then the deep and transfer learning techniques for device-free human activity recognition are introduced.Eventually,insights on existing works and grand challenges are summarized and presented to promote future research.
基金Supported by the National Natural Science Foundation of China (62202346)Hubei Key Research and Development Program (2021BAA042)+3 种基金Open project of Engineering Research Center of Hubei Province for Clothing Information (2022HBCI01)Wuhan Applied Basic Frontier Research Project (2022013988065212)MIIT′s AI Industry Innovation Task Unveils Flagship Projects (Key Technologies,Equipment,and Systems for Flexible Customized and Intelligent Manufacturing in the Clothing Industry)Hubei Science and Technology Project of Safe Production Special Fund (Scene Control Platform Based on Proprioception Information Computing of Artificial Intelligence)。
文摘Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables domain predominantly emphasizes sensor functionality and quantity,often skipping crucial aspects related to user experience and interaction.Methods To address this gap,this study introduces a novel real-time 3D interactive system based on intelligent garments.The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulsed neural units to classify and recognize human movements,thereby achieving real-time interaction between users and sensors.Additionally,the system incorporates 3D human visualization functionality,which visualizes sensor data and recognizes human actions as 3D models in real time,providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion.This system has significant potential for applications in motion detection,medical monitoring,virtual reality,and other fields.The accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies.Conclusions This study has substantial implications in the domains of intelligent garments,human motion monitoring,and digital twin visualization.The advancement of this system is expected to propel the progress of wearable technology and foster a deeper comprehension of human motion.
文摘Medical-action recognition is crucial for ensuring the quality of medical services.With advancements in deep learning,RGB camera-based human-action recognition made huge advancements.However,RGB cameras encounter issues,such as depth ambiguity and privacy violation.In this paper,we propose a novel lidar-based action-recognition algorithm for medical quality control.Further,point-cloud data were used for recognizing hand-washing actions of doctors and recording the action’s duration.An improved anchor-to-joint(A2J)network,with pyramid vision transformer and feature pyramid network modules,was developed for estimating the human poses.In addition,we designed a graph convolution network for action classification based on the skeleton data.Then,we evaluated the performance of the improved A2J network on the open-source ITOP and our medical pose estimation datasets.Further,we tested our medical action-recognition method in actual wards to demonstrate its effectiveness and running efficiency.The results show that the proposed algorithm can effectively recognize the actions of medical staff,providing satisfactory real-time performance and 96.3% action-classification accuracy.
基金supported by the National Natural Science Foundation of China(Nos.61871038 and 61931012)the Premium Funding Project for Academic Human Resources Development of Beijing Union University(No.BPHR2020AZ02)the Generic Pre-research Program of the Equipment Development Department in Military Commission(No.41412040302).
文摘Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos.They are both active research topics in computer vision community,which have attracted considerable attention from academia and industry.They are also the precondition for intelligent interaction and human-computer cooperation,and they help the machine perceive the external environment.In the past decade,tremendous progress has been made in the field,especially after the emergence of deep learning technologies.Hence,it is necessary to make a comprehensive review of recent developments.In this paper,firstly,we attempt to present the background,and then discuss research progresses.Secondly,we introduce datasets,various typical feature representation methods,and explore advanced human action recognition and posture prediction algorithms.Finally,facing the challenges in the field,this paper puts forward the research focus,and introduces the importance of action recognition and posture prediction by taking interactive cognition in self-driving vehicle as an example.
文摘Human action recognition from skeletal data is an important and active area of research in which the state of the art has not yet achieved near-perfect accuracy on many well- known datasets. In this paper, we introduce the Distribution of Action Movements Descriptor, a novel action descriptor based on the distribution of the directions of the motions of the joints between frames, over the set of all possible mo- tions in the dataset. The descriptor is computed as a normal- ized histogram over a set of representative directions of the joints, which are in turn obtained via clustering. While the descriptor is global in the sense that it represents the overall distribution of movement directions of an action, it is able to partially retain its temporal structure by applying a window- ing scheme. The descriptor, together with performs several state-of-the-art known datasets. a standard classifier, out- techniques on many well-
基金Supported by the National Nature Science Foundation of China under Grant Nos.11475003,61603003,and 11471093the Key Project of Cultivation of Leading Talents in Universities of Anhui Province under Grant No.gxfxZD2016174+2 种基金Funds of Integration of Cloud Computing and Big DataInnovation of Science and Technology of Ministry of Education of China under Grant No.2017A09116Anhui Provincial Department of Education Outstanding Top-Notch Talent-Funded Project under Grant No.gxbjZD26
文摘Human action recognition has become one of the most active research topics in human-computer interaction and artificial intelligence, and has attracted much attention. Here, we employ a low-cost optical sensor Kinect to capture the action information of the human skeleton. We then propose a two-level hierarchical human action recognition model with self-selection classifiers via skeleton data. Especially different optimal classifiers are selected by probability voting mechanism and 10 times 10-fold cross validation at different coarse grained levels. Extensive simulations on a well-known open dataset and results demonstrate that our proposed method is efficient in human action recognition, achieving 94.19%the average recognition rate and 95.61% the best rate.
文摘Real-time video surveillance system is commonly employed to aid security professionals in preventing crimes.The use of deep learning(DL)technologies has transformed real-time video surveillance into smart video surveillance systems that automate human behavior classification.The recognition of events in the surveillance videos is considered a hot research topic in the field of computer science and it is gaining significant attention.Human action recognition(HAR)is treated as a crucial issue in several applications areas and smart video surveillance to improve the security level.The advancements of the DL models help to accomplish improved recognition performance.In this view,this paper presents a smart deep-based human behavior classification(SDL-HBC)model for real-time video surveillance.The proposed SDL-HBC model majorly aims to employ an adaptive median filtering(AMF)based pre-processing to reduce the noise content.Also,the capsule network(CapsNet)model is utilized for the extraction of feature vectors and the hyperparameter tuning of the CapsNet model takes place utilizing the Adam optimizer.Finally,the differential evolution(DE)with stacked autoencoder(SAE)model is applied for the classification of human activities in the intelligent video surveillance system.The performance validation of the SDL-HBC technique takes place using two benchmark datasets such as the KTH dataset.The experimental outcomes reported the enhanced recognition performance of the SDL-HBC technique over the recent state of art approaches with maximum accuracy of 0.9922.
文摘Human Action Recognition(HAR)attempts to recognize the human action from images and videos.The major challenge in HAR is the design of an action descriptor that makes the HAR system robust for different environments.A novel action descriptor is proposed in this study,based on two independent spatial and spectral filters.The proposed descriptor uses a Difference of Gaussian(DoG)filter to extract scale-invariant features and a Difference of Wavelet(DoW)filter to extract spectral information.To create a composite feature vector for a particular test action picture,the Discriminant of Guassian(DoG)and Difference of Wavelet(DoW)features are combined.Linear Discriminant Analysis(LDA),a widely used dimensionality reduction technique,is also used to eliminate duplicate data.Finally,a closest neighbor method is used to classify the dataset.Weizmann and UCF 11 datasets were used to run extensive simulations of the suggested strategy,and the accuracy assessed after the simulations were run on Weizmann datasets for five-fold cross validation is shown to perform well.The average accuracy of DoG+DoW is observed as 83.6635%while the average accuracy of Discrinanat of Guassian(DoG)and Difference of Wavelet(DoW)is observed as 80.2312%and 77.4215%,respectively.The average accuracy measured after the simulation of proposed methods over UCF 11 action dataset for five-fold cross validation DoG+DoW is observed as 62.5231%while the average accuracy of Difference of Guassian(DoG)and Difference of Wavelet(DoW)is observed as 60.3214%and 58.1247%,respectively.From the above accuracy observations,the accuracy of Weizmann is high compared to the accuracy of UCF 11,hence verifying the effectiveness in the improvisation of recognition accuracy.
基金This study was supported by the National Natural Science Founda-tion of China(Grant Nos.91948201 and 62073191).
文摘Human action recognition based on skeleton information has been extensively used in various areas,such as human-computer interaction.In this paper,we extracted human skeleton data by constructing a two-stage human pose estimation model,which combined the improved single shot detector(SSD)algorithm with convolutional pose machines(CPM)to obtain human skeleton heatmaps.The backbone of the SSD algorithm was replaced with ResNet,which can characterize images effectively.In addition,we designed multiscale transformation rules for CPM to fuse the information of different scales and a convolutional neural network for the classification of the skeleton keypoints heatmaps to complete action recognition.Indoor and outdoor experiments were conducted on the Caster Moma mobile robot platform,and without an external remote control,the real-time movement of the robot was controlled by the leader through command actions.