In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the e...In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach.展开更多
In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision lev...In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.展开更多
This paper proposes a framework for human action recognition based on procrustes analysis and Fisher vector coding(FVC).Firstly,we applied a pose feature extracted from silhouette image by employing Procrustes analysi...This paper proposes a framework for human action recognition based on procrustes analysis and Fisher vector coding(FVC).Firstly,we applied a pose feature extracted from silhouette image by employing Procrustes analysis and local preserving projection(LPP).Secondly,the extracted feature can preserve the discriminative shape information and local manifold structure of human pose and is invariant to translation,rotation and scaling.Finally,after the pose feature was extracted,a recognition framework based on FVC and multi-class supporting vector machine was employed to classify the human action.Experimental results on benchmarks demonstrate the effectiveness of the proposed method.展开更多
This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion b...This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion between human and objects during the interacting process.Since that human actions and interacted objects provide strong context information,i.e.some actions are usually related to some specific objects,the accuracy of recognition is significantly improved for both of them.Through the proposed method,both global and local temporal features from skeleton sequences are extracted to model human actions.In the meantime,kernel features are utilized to describe interacted objects.Finally,all possible solutions from actions and objects are optimized by modeling the context between them.The results of experiments demonstrate the effectiveness of our method.展开更多
Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate....Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate. A new recognition method of the human action is proposed with the multi-scale directed depth motion maps(MsdDMMs) and Log-Gabor filters. According to the difference between the speed and time order of an action, MsdDMMs is proposed under the energy framework. Meanwhile, Log-Gabor is utilized to describe the texture details of MsdDMMs for the motion characteristics. It can easily satisfy both the texture characterization and the visual features of human eye. Furthermore, the collaborative representation is employed as action recognition by the classification. Experimental results show that the proposed algorithm, which is applied in the MSRAction3 D dataset and MSRGesture3 D dataset, can achieve the accuracy of 95.79% and 96.43% respectively. It also has higher accuracy than the existing algorithms, such as super normal vector(SNV), hierarchical recurrent neural network(Hierarchical RNN).展开更多
The development of artificial intelligence(AI)and smart home technologies has driven the need for speech recognition-based solutions.This demand stems from the quest for more intuitive and natural interaction between ...The development of artificial intelligence(AI)and smart home technologies has driven the need for speech recognition-based solutions.This demand stems from the quest for more intuitive and natural interaction between users and smart devices in their homes.Speech recognition allows users to control devices and perform everyday actions through spoken commands,eliminating the need for physical interfaces or touch screens and enabling specific tasks such as turning on or off the light,heating,or lowering the blinds.The purpose of this study is to develop a speech-based classification model for recognizing human actions in the smart home.It seeks to demonstrate the effectiveness and feasibility of using machine learning techniques in predicting categories,subcategories,and actions from sentences.A dataset labeled with relevant information about categories,subcategories,and actions related to human actions in the smart home is used.The methodology uses machine learning techniques implemented in Python,extracting features using CountVectorizer to convert sentences into numerical representations.The results show that the classification model is able to accurately predict categories,subcategories,and actions based on sentences,with 82.99%accuracy for category,76.19%accuracy for subcategory,and 90.28%accuracy for action.The study concludes that using machine learning techniques is effective for recognizing and classifying human actions in the smart home,supporting its feasibility in various scenarios and opening new possibilities for advanced natural language processing systems in the field of AI and smart homes.展开更多
Human action recognition(HAR)based on Artificial intelligence reasoning is the most important research area in computer vision.Big breakthroughs in this field have been observed in the last few years;additionally,the ...Human action recognition(HAR)based on Artificial intelligence reasoning is the most important research area in computer vision.Big breakthroughs in this field have been observed in the last few years;additionally,the interest in research in this field is evolving,such as understanding of actions and scenes,studying human joints,and human posture recognition.Many HAR techniques are introduced in the literature.Nonetheless,the challenge of redundant and irrelevant features reduces recognition accuracy.They also faced a few other challenges,such as differing perspectives,environmental conditions,and temporal variations,among others.In this work,a deep learning and improved whale optimization algorithm based framework is proposed for HAR.The proposed framework consists of a few core stages i.e.,frames initial preprocessing,fine-tuned pre-trained deep learning models through transfer learning(TL),features fusion using modified serial based approach,and improved whale optimization based best features selection for final classification.Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets.The fusion process increases the length of feature vectors;therefore,improved whale optimization algorithm is proposed and selects the best features.The best selected features are finally classified usingmachine learning(ML)classifiers.Four publicly accessible datasets such as Ut-interaction,Hollywood,Free Viewpoint Action Recognition usingMotion History Volumes(IXMAS),and centre of computer vision(UCF)Sports,are employed and achieved the testing accuracy of 100%,99.9%,99.1%,and 100%respectively.Comparison with state of the art techniques(SOTA),the proposed method showed the improved accuracy.展开更多
Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and auto...Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and automated analysis of video information is required.However,various issues such as data size limitations and low processing speeds make real-time extraction of video data challenging.Video analysis technology applies object classification,detection,and relationship analysis to continuous 2D frame data,and the various meanings within the video are thus analyzed based on the extracted basic data.Motion recognition is key in this analysis.Motion recognition is a challenging field that analyzes human body movements,requiring the interpretation of complex movements of human joints and the relationships between various objects.The deep learning-based human skeleton detection algorithm is a representative motion recognition algorithm.Recently,motion analysis models such as the SlowFast network algorithm,have also been developed with excellent performance.However,these models do not operate properly in most wide-angle video environments outdoors,displaying low response speed,as expected from motion classification extraction in environments associated with high-resolution images.The proposed method achieves high level of extraction and accuracy by improving SlowFast’s input data preprocessing and data structure methods.The input data are preprocessed through object tracking and background removal using YOLO and DeepSORT.A higher performance than that of a single model is achieved by improving the existing SlowFast’s data structure into a frame unit structure.Based on the confusion matrix,accuracies of 70.16%and 70.74%were obtained for the existing SlowFast and proposed model,respectively,indicating a 0.58%increase in accuracy.Comparing detection,based on behavioral classification,the existing SlowFast detected 2,341,164 cases,whereas the proposed model detected 3,119,323 cases,which is an increase of 33.23%.展开更多
毫米波雷达凭借其出色的环境适应性、高分辨率和隐私保护等优势,在智能家居、智慧养老和安防监控等领域具有广泛的应用前景。毫米波雷达三维点云是一种重要的空间数据表达形式,对于人体行为姿态识别具有极大的价值。然而,由于毫米波雷...毫米波雷达凭借其出色的环境适应性、高分辨率和隐私保护等优势,在智能家居、智慧养老和安防监控等领域具有广泛的应用前景。毫米波雷达三维点云是一种重要的空间数据表达形式,对于人体行为姿态识别具有极大的价值。然而,由于毫米波雷达点云具有强稀疏性,给精准快速识别人体动作带来了巨大的挑战。针对这一问题,该文公开了一个毫米波雷达人体动作三维点云数据集mmWave-3DPCHM-1.0,并提出了相应的数据处理方法和人体动作识别模型。该数据集由TI公司的IWR1443-ISK和Vayyar公司的vBlu射频成像模组分别采集,包括常见的12种人体动作,如走路、挥手、站立和跌倒等。在网络模型方面,该文将边缘卷积(EdgeConv)与Transformer相结合,提出了一种处理长时序三维点云的网络模型,即Point EdgeConv and Transformer(PETer)网络。该网络通过边缘卷积对三维点云逐帧创建局部有向邻域图,以提取单帧点云的空间几何特征,并通过堆叠多个编码器的Transformer模块,提取多帧点云之间的时序关系。实验结果表明,所提出的PETer网络在所构建的TI数据集和Vayyar数据集上的平均识别准确率分别达到98.77%和99.51%,比传统最优的基线网络模型提高了大约5%,且网络规模仅为1.09 M,适于在存储受限的边缘设备上部署。展开更多
Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras avai...Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras available in various devices that can be in a static or dynamic position and are referred to as untrimmed videos.Smarter monitoring is a historical necessity in which commonly occurring,regular,and out-of-the-ordinary activities can be automatically identified using intelligence systems and computer vision technology.In a long video,human activity may be present anywhere in the video.There can be a single ormultiple human activities present in such videos.This paper presents a deep learning-based methodology to identify the locally present human activities in the video sequences captured by a single wide-view camera in a sports environment.The recognition process is split into four parts:firstly,the video is divided into different set of frames,then the human body part in a sequence of frames is identified,next process is to identify the human activity using a convolutional neural network and finally the time information of the observed postures for each activity is determined with the help of a deep learning algorithm.The proposed approach has been tested on two different sports datasets including ActivityNet and THUMOS.Three sports activities like swimming,cricket bowling and high jump have been considered in this paper and classified with the temporal information i.e.,the start and end time for every activity present in the video.The convolutional neural network and long short-term memory are used for feature extraction of temporal action recognition from video data of sports activity.The outcomes show that the proposed method for activity recognition in the sports domain outperforms the existing methods.展开更多
Device-free activity recognition plays a crucial role in smart building,security,and human–computer interaction,which shows its strength in its convenience and cost-efficiency.Traditional machine learning has made si...Device-free activity recognition plays a crucial role in smart building,security,and human–computer interaction,which shows its strength in its convenience and cost-efficiency.Traditional machine learning has made significant progress by heuristic hand-crafted features and statistical models,but it suffers from the limitation of manual feature design.Deep learning overcomes such issues by automatic high-level feature extraction,but its performance degrades due to the requirement of massive annotated data and cross-site issues.To deal with these problems,transfer learning helps to transfer knowledge from existing datasets while dealing with the negative effect of background dynamics.This paper surveys the recent progress of deep learning and transfer learning for device-free activity recognition.We begin with the motivation of deep learning and transfer learning,and then introduce the major sensor modalities.Then the deep and transfer learning techniques for device-free human activity recognition are introduced.Eventually,insights on existing works and grand challenges are summarized and presented to promote future research.展开更多
This paper proposes the research on human body behavior recognition based on vision. Behavior based on high-level human structure can describe behavior more accurately, but it is dif? cult to extract the behavioral c...This paper proposes the research on human body behavior recognition based on vision. Behavior based on high-level human structure can describe behavior more accurately, but it is dif? cult to extract the behavioral characteristics while often relying on the accuracy of the human pose estimation. Moving object extraction of the moving targets in video analysis as the main content, research based on the image sequence robust, fast moving target extraction, motion estimation and target description algorithm, and the correlation between motion detection is to use frame, frame by comparing the difference between for change and not change area. The model is proposed based on the probability theory, and the future research will be focused on the simulation.展开更多
针对人体动作识别任务中特征值选取不当导致识别率低、使用多模态数据导致训练成本高等问题,提出一种轻量级人体动作识别方法。首先使用OpenPose、PoseNet提取出人体骨架信息,使用BWT69CL传感器提取姿势信息;其次对数据进行预处理、特...针对人体动作识别任务中特征值选取不当导致识别率低、使用多模态数据导致训练成本高等问题,提出一种轻量级人体动作识别方法。首先使用OpenPose、PoseNet提取出人体骨架信息,使用BWT69CL传感器提取姿势信息;其次对数据进行预处理、特征融合,对人体动作进行深度学习分类识别;最后,为验证此方法的有效性,在公开数据集WISDM、UCIHAR、HASC和自建的人体动作数据集上进行实验验证,并使用改进的目标引导注意力机制(target-guided attention,TGA)–长短期记忆(long short term memory,LSTM)网络输出最终的分类结果。实验结果表明,在自建数据集下融合姿势和骨架特征达到99.87%准确率,相比于只使用姿势信息特征,识别准确率提高了约5.31个百分点;相比于只使用人体骨架特征,识别准确率提高了约1.87个百分点;在识别时间上相比于只使用姿势信息,识别时间降低了约29.73 s;相比于只使用人体骨架数据,识别时间降低了约9 s。使用该方法能及时有效地反映人体的运动意图,有助于提高人体动作和行为的识别准确率和训练效率。展开更多
文摘In recent years,wearable devices-based Human Activity Recognition(HAR)models have received significant attention.Previously developed HAR models use hand-crafted features to recognize human activities,leading to the extraction of basic features.The images captured by wearable sensors contain advanced features,allowing them to be analyzed by deep learning algorithms to enhance the detection and recognition of human actions.Poor lighting and limited sensor capabilities can impact data quality,making the recognition of human actions a challenging task.The unimodal-based HAR approaches are not suitable in a real-time environment.Therefore,an updated HAR model is developed using multiple types of data and an advanced deep-learning approach.Firstly,the required signals and sensor data are accumulated from the standard databases.From these signals,the wave features are retrieved.Then the extracted wave features and sensor data are given as the input to recognize the human activity.An Adaptive Hybrid Deep Attentive Network(AHDAN)is developed by incorporating a“1D Convolutional Neural Network(1DCNN)”with a“Gated Recurrent Unit(GRU)”for the human activity recognition process.Additionally,the Enhanced Archerfish Hunting Optimizer(EAHO)is suggested to fine-tune the network parameters for enhancing the recognition process.An experimental evaluation is performed on various deep learning networks and heuristic algorithms to confirm the effectiveness of the proposed HAR model.The EAHO-based HAR model outperforms traditional deep learning networks with an accuracy of 95.36,95.25 for recall,95.48 for specificity,and 95.47 for precision,respectively.The result proved that the developed model is effective in recognizing human action by taking less time.Additionally,it reduces the computation complexity and overfitting issue through using an optimization approach.
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+2 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the Fundamental Research Funds for the Central Universities No. FRF-TP-14-046A2the first-class discipline construction transitional funds of Minzu University of China
文摘In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.
基金National Natural Science Foundation of China(No.61602148)Natural Science Foundation of Fujian Province,China(No.2016J01040)Xiamen University of Technology High Level Talents Project,China(No.YKJ15018R)
文摘This paper proposes a framework for human action recognition based on procrustes analysis and Fisher vector coding(FVC).Firstly,we applied a pose feature extracted from silhouette image by employing Procrustes analysis and local preserving projection(LPP).Secondly,the extracted feature can preserve the discriminative shape information and local manifold structure of human pose and is invariant to translation,rotation and scaling.Finally,after the pose feature was extracted,a recognition framework based on FVC and multi-class supporting vector machine was employed to classify the human action.Experimental results on benchmarks demonstrate the effectiveness of the proposed method.
文摘This paper proposes a method to recognize human-object interactions by modeling context between human actions and interacted objects.Human-object interaction recognition is a challenging task due to severe occlusion between human and objects during the interacting process.Since that human actions and interacted objects provide strong context information,i.e.some actions are usually related to some specific objects,the accuracy of recognition is significantly improved for both of them.Through the proposed method,both global and local temporal features from skeleton sequences are extracted to model human actions.In the meantime,kernel features are utilized to describe interacted objects.Finally,all possible solutions from actions and objects are optimized by modeling the context between them.The results of experiments demonstrate the effectiveness of our method.
基金Sponsored by the Jiangsu Prospective Joint Research Project(Grant No.BY2016022-28)
文摘Recognition of the human actions by computer vision has become an active research area in recent years. Due to the speed and the high similarity of the actions, the current algorithms cannot get high recognition rate. A new recognition method of the human action is proposed with the multi-scale directed depth motion maps(MsdDMMs) and Log-Gabor filters. According to the difference between the speed and time order of an action, MsdDMMs is proposed under the energy framework. Meanwhile, Log-Gabor is utilized to describe the texture details of MsdDMMs for the motion characteristics. It can easily satisfy both the texture characterization and the visual features of human eye. Furthermore, the collaborative representation is employed as action recognition by the classification. Experimental results show that the proposed algorithm, which is applied in the MSRAction3 D dataset and MSRGesture3 D dataset, can achieve the accuracy of 95.79% and 96.43% respectively. It also has higher accuracy than the existing algorithms, such as super normal vector(SNV), hierarchical recurrent neural network(Hierarchical RNN).
基金supported by Generalitat Valenciana with HAAS(CIAICO/2021/039)the Spanish Ministry of Science and Innovation under the Project AVANTIA PID2020-114480RB-I00.
文摘The development of artificial intelligence(AI)and smart home technologies has driven the need for speech recognition-based solutions.This demand stems from the quest for more intuitive and natural interaction between users and smart devices in their homes.Speech recognition allows users to control devices and perform everyday actions through spoken commands,eliminating the need for physical interfaces or touch screens and enabling specific tasks such as turning on or off the light,heating,or lowering the blinds.The purpose of this study is to develop a speech-based classification model for recognizing human actions in the smart home.It seeks to demonstrate the effectiveness and feasibility of using machine learning techniques in predicting categories,subcategories,and actions from sentences.A dataset labeled with relevant information about categories,subcategories,and actions related to human actions in the smart home is used.The methodology uses machine learning techniques implemented in Python,extracting features using CountVectorizer to convert sentences into numerical representations.The results show that the classification model is able to accurately predict categories,subcategories,and actions based on sentences,with 82.99%accuracy for category,76.19%accuracy for subcategory,and 90.28%accuracy for action.The study concludes that using machine learning techniques is effective for recognizing and classifying human actions in the smart home,supporting its feasibility in various scenarios and opening new possibilities for advanced natural language processing systems in the field of AI and smart homes.
基金This research work is supported in part by Chiang Mai University and HITEC University.
文摘Human action recognition(HAR)based on Artificial intelligence reasoning is the most important research area in computer vision.Big breakthroughs in this field have been observed in the last few years;additionally,the interest in research in this field is evolving,such as understanding of actions and scenes,studying human joints,and human posture recognition.Many HAR techniques are introduced in the literature.Nonetheless,the challenge of redundant and irrelevant features reduces recognition accuracy.They also faced a few other challenges,such as differing perspectives,environmental conditions,and temporal variations,among others.In this work,a deep learning and improved whale optimization algorithm based framework is proposed for HAR.The proposed framework consists of a few core stages i.e.,frames initial preprocessing,fine-tuned pre-trained deep learning models through transfer learning(TL),features fusion using modified serial based approach,and improved whale optimization based best features selection for final classification.Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets.The fusion process increases the length of feature vectors;therefore,improved whale optimization algorithm is proposed and selects the best features.The best selected features are finally classified usingmachine learning(ML)classifiers.Four publicly accessible datasets such as Ut-interaction,Hollywood,Free Viewpoint Action Recognition usingMotion History Volumes(IXMAS),and centre of computer vision(UCF)Sports,are employed and achieved the testing accuracy of 100%,99.9%,99.1%,and 100%respectively.Comparison with state of the art techniques(SOTA),the proposed method showed the improved accuracy.
基金supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1A6A1A03040583)supported by Kyonggi University’s Graduate Research Assistantship 2023.
文摘Artificial intelligence is increasingly being applied in the field of video analysis,particularly in the area of public safety where video surveillance equipment such as closed-circuit television(CCTV)is used and automated analysis of video information is required.However,various issues such as data size limitations and low processing speeds make real-time extraction of video data challenging.Video analysis technology applies object classification,detection,and relationship analysis to continuous 2D frame data,and the various meanings within the video are thus analyzed based on the extracted basic data.Motion recognition is key in this analysis.Motion recognition is a challenging field that analyzes human body movements,requiring the interpretation of complex movements of human joints and the relationships between various objects.The deep learning-based human skeleton detection algorithm is a representative motion recognition algorithm.Recently,motion analysis models such as the SlowFast network algorithm,have also been developed with excellent performance.However,these models do not operate properly in most wide-angle video environments outdoors,displaying low response speed,as expected from motion classification extraction in environments associated with high-resolution images.The proposed method achieves high level of extraction and accuracy by improving SlowFast’s input data preprocessing and data structure methods.The input data are preprocessed through object tracking and background removal using YOLO and DeepSORT.A higher performance than that of a single model is achieved by improving the existing SlowFast’s data structure into a frame unit structure.Based on the confusion matrix,accuracies of 70.16%and 70.74%were obtained for the existing SlowFast and proposed model,respectively,indicating a 0.58%increase in accuracy.Comparing detection,based on behavioral classification,the existing SlowFast detected 2,341,164 cases,whereas the proposed model detected 3,119,323 cases,which is an increase of 33.23%.
文摘毫米波雷达凭借其出色的环境适应性、高分辨率和隐私保护等优势,在智能家居、智慧养老和安防监控等领域具有广泛的应用前景。毫米波雷达三维点云是一种重要的空间数据表达形式,对于人体行为姿态识别具有极大的价值。然而,由于毫米波雷达点云具有强稀疏性,给精准快速识别人体动作带来了巨大的挑战。针对这一问题,该文公开了一个毫米波雷达人体动作三维点云数据集mmWave-3DPCHM-1.0,并提出了相应的数据处理方法和人体动作识别模型。该数据集由TI公司的IWR1443-ISK和Vayyar公司的vBlu射频成像模组分别采集,包括常见的12种人体动作,如走路、挥手、站立和跌倒等。在网络模型方面,该文将边缘卷积(EdgeConv)与Transformer相结合,提出了一种处理长时序三维点云的网络模型,即Point EdgeConv and Transformer(PETer)网络。该网络通过边缘卷积对三维点云逐帧创建局部有向邻域图,以提取单帧点云的空间几何特征,并通过堆叠多个编码器的Transformer模块,提取多帧点云之间的时序关系。实验结果表明,所提出的PETer网络在所构建的TI数据集和Vayyar数据集上的平均识别准确率分别达到98.77%和99.51%,比传统最优的基线网络模型提高了大约5%,且网络规模仅为1.09 M,适于在存储受限的边缘设备上部署。
基金This work was supported by the Deanship of Scientific Research at King Khalid University through a General Research Project under Grant Number GRP/41/42.
文摘Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras available in various devices that can be in a static or dynamic position and are referred to as untrimmed videos.Smarter monitoring is a historical necessity in which commonly occurring,regular,and out-of-the-ordinary activities can be automatically identified using intelligence systems and computer vision technology.In a long video,human activity may be present anywhere in the video.There can be a single ormultiple human activities present in such videos.This paper presents a deep learning-based methodology to identify the locally present human activities in the video sequences captured by a single wide-view camera in a sports environment.The recognition process is split into four parts:firstly,the video is divided into different set of frames,then the human body part in a sequence of frames is identified,next process is to identify the human activity using a convolutional neural network and finally the time information of the observed postures for each activity is determined with the help of a deep learning algorithm.The proposed approach has been tested on two different sports datasets including ActivityNet and THUMOS.Three sports activities like swimming,cricket bowling and high jump have been considered in this paper and classified with the temporal information i.e.,the start and end time for every activity present in the video.The convolutional neural network and long short-term memory are used for feature extraction of temporal action recognition from video data of sports activity.The outcomes show that the proposed method for activity recognition in the sports domain outperforms the existing methods.
基金This work is supported by NTU Presidential Postdoctoral Fellowship,"Adaptive Multimodal Learning for Robust Sensing and Recognition in Smart Cities"project fund,in Nanyang Technological University,Singapore.
文摘Device-free activity recognition plays a crucial role in smart building,security,and human–computer interaction,which shows its strength in its convenience and cost-efficiency.Traditional machine learning has made significant progress by heuristic hand-crafted features and statistical models,but it suffers from the limitation of manual feature design.Deep learning overcomes such issues by automatic high-level feature extraction,but its performance degrades due to the requirement of massive annotated data and cross-site issues.To deal with these problems,transfer learning helps to transfer knowledge from existing datasets while dealing with the negative effect of background dynamics.This paper surveys the recent progress of deep learning and transfer learning for device-free activity recognition.We begin with the motivation of deep learning and transfer learning,and then introduce the major sensor modalities.Then the deep and transfer learning techniques for device-free human activity recognition are introduced.Eventually,insights on existing works and grand challenges are summarized and presented to promote future research.
文摘This paper proposes the research on human body behavior recognition based on vision. Behavior based on high-level human structure can describe behavior more accurately, but it is dif? cult to extract the behavioral characteristics while often relying on the accuracy of the human pose estimation. Moving object extraction of the moving targets in video analysis as the main content, research based on the image sequence robust, fast moving target extraction, motion estimation and target description algorithm, and the correlation between motion detection is to use frame, frame by comparing the difference between for change and not change area. The model is proposed based on the probability theory, and the future research will be focused on the simulation.
文摘针对人体动作识别任务中特征值选取不当导致识别率低、使用多模态数据导致训练成本高等问题,提出一种轻量级人体动作识别方法。首先使用OpenPose、PoseNet提取出人体骨架信息,使用BWT69CL传感器提取姿势信息;其次对数据进行预处理、特征融合,对人体动作进行深度学习分类识别;最后,为验证此方法的有效性,在公开数据集WISDM、UCIHAR、HASC和自建的人体动作数据集上进行实验验证,并使用改进的目标引导注意力机制(target-guided attention,TGA)–长短期记忆(long short term memory,LSTM)网络输出最终的分类结果。实验结果表明,在自建数据集下融合姿势和骨架特征达到99.87%准确率,相比于只使用姿势信息特征,识别准确率提高了约5.31个百分点;相比于只使用人体骨架特征,识别准确率提高了约1.87个百分点;在识别时间上相比于只使用姿势信息,识别时间降低了约29.73 s;相比于只使用人体骨架数据,识别时间降低了约9 s。使用该方法能及时有效地反映人体的运动意图,有助于提高人体动作和行为的识别准确率和训练效率。