Internet of Vehicles (IoV) is a new system that enables individual vehicles to connect with nearby vehicles,people, transportation infrastructure, and networks, thereby realizing amore intelligent and efficient transp...Internet of Vehicles (IoV) is a new system that enables individual vehicles to connect with nearby vehicles,people, transportation infrastructure, and networks, thereby realizing amore intelligent and efficient transportationsystem. The movement of vehicles and the three-dimensional (3D) nature of the road network cause the topologicalstructure of IoV to have the high space and time complexity.Network modeling and structure recognition for 3Droads can benefit the description of topological changes for IoV. This paper proposes a 3Dgeneral roadmodel basedon discrete points of roads obtained from GIS. First, the constraints imposed by 3D roads on moving vehicles areanalyzed. Then the effects of road curvature radius (Ra), longitudinal slope (Slo), and length (Len) on speed andacceleration are studied. Finally, a general 3D road network model based on road section features is established.This paper also presents intersection and road section recognition methods based on the structural features ofthe 3D road network model and the road features. Real GIS data from a specific region of Beijing is adopted tocreate the simulation scenario, and the simulation results validate the general 3D road network model and therecognitionmethod. Therefore, thiswork makes contributions to the field of intelligent transportation by providinga comprehensive approach tomodeling the 3Droad network and its topological changes in achieving efficient trafficflowand improved road safety.展开更多
Appearance-based dynamic Hand Gesture Recognition(HGR)remains a prominent area of research in Human-Computer Interaction(HCI).Numerous environmental and computational constraints limit its real-time deployment.In addi...Appearance-based dynamic Hand Gesture Recognition(HGR)remains a prominent area of research in Human-Computer Interaction(HCI).Numerous environmental and computational constraints limit its real-time deployment.In addition,the performance of a model decreases as the subject’s distance from the camera increases.This study proposes a 3D separable Convolutional Neural Network(CNN),considering the model’s computa-tional complexity and recognition accuracy.The 20BN-Jester dataset was used to train the model for six gesture classes.After achieving the best offline recognition accuracy of 94.39%,the model was deployed in real-time while considering the subject’s attention,the instant of performing a gesture,and the subject’s distance from the camera.Despite being discussed in numerous research articles,the distance factor remains unresolved in real-time deployment,which leads to degraded recognition results.In the proposed approach,the distance calculation substantially improves the classification performance by reducing the impact of the subject’s distance from the camera.Additionally,the capability of feature extraction,degree of relevance,and statistical significance of the proposed model against other state-of-the-art models were validated using t-distributed Stochastic Neighbor Embedding(t-SNE),Mathew’s Correlation Coefficient(MCC),and the McNemar test,respectively.We observed that the proposed model exhibits state-of-the-art outcomes and a comparatively high significance level.展开更多
Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables dom...Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables domain predominantly emphasizes sensor functionality and quantity,often skipping crucial aspects related to user experience and interaction.Methods To address this gap,this study introduces a novel real-time 3D interactive system based on intelligent garments.The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulsed neural units to classify and recognize human movements,thereby achieving real-time interaction between users and sensors.Additionally,the system incorporates 3D human visualization functionality,which visualizes sensor data and recognizes human actions as 3D models in real time,providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion.This system has significant potential for applications in motion detection,medical monitoring,virtual reality,and other fields.The accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies.Conclusions This study has substantial implications in the domains of intelligent garments,human motion monitoring,and digital twin visualization.The advancement of this system is expected to propel the progress of wearable technology and foster a deeper comprehension of human motion.展开更多
The staggered distribution of joints and fissures in space constitutes the weak part of any rock mass.The identification of rock mass structural planes and the extraction of characteristic parameters are the basis of ...The staggered distribution of joints and fissures in space constitutes the weak part of any rock mass.The identification of rock mass structural planes and the extraction of characteristic parameters are the basis of rock-mass integrity evaluation,which is very important for analysis of slope stability.The laser scanning technique can be used to acquire the coordinate information pertaining to each point of the structural plane,but large amount of point cloud data,uneven density distribution,and noise point interference make the identification efficiency and accuracy of different types of structural planes limited by point cloud data analysis technology.A new point cloud identification and segmentation algorithm for rock mass structural surfaces is proposed.Based on the distribution states of the original point cloud in different neighborhoods in space,the point clouds are characterized by multi-dimensional eigenvalues and calculated by the robust randomized Hough transform(RRHT).The normal vector difference and the final eigenvalue are proposed for characteristic distinction,and the identification of rock mass structural surfaces is completed through regional growth,which strengthens the difference expression of point clouds.In addition,nearest Voxel downsampling is also introduced in the RRHT calculation,which further reduces the number of sources of neighborhood noises,thereby improving the accuracy and stability of the calculation.The advantages of the method have been verified by laboratory models.The results showed that the proposed method can better achieve the segmentation and statistics of structural planes with interfaces and sharp boundaries.The method works well in the identification of joints,fissures,and other structural planes on Mangshezhai slope in the Three Gorges Reservoir area,China.It can provide a stable and effective technique for the identification and segmentation of rock mass structural planes,which is beneficial in engineering practice.展开更多
Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose...Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.展开更多
Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the...Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.展开更多
This paper presents a method for hand gesture recognition based on 3D point cloud. Digital image processing technology is used in this research. Based on the 3D point from depth camera, the system firstly extracts som...This paper presents a method for hand gesture recognition based on 3D point cloud. Digital image processing technology is used in this research. Based on the 3D point from depth camera, the system firstly extracts some raw data of the hand. After the data segmentation and preprocessing, three kinds of appearance features are extracted, including the number of stretched fingers, the angles between fingers and the gesture region’s area distribution feature. Based on these features, the system implements the identification of the gestures by using decision tree method. The results of experiment demonstrate that the proposed method is pretty efficient to recognize common gestures with a high accuracy.展开更多
Expression, occlusion, and pose variations are three main challenges for 3D face recognition. A novel method is presented to address 3D face recognition using scale-invariant feature transform(SIFT) features on 3D mes...Expression, occlusion, and pose variations are three main challenges for 3D face recognition. A novel method is presented to address 3D face recognition using scale-invariant feature transform(SIFT) features on 3D meshes. After preprocessing, shape index extrema on the 3D facial surface are selected as keypoints in the difference scale space and the unstable keypoints are removed after two screening steps. Then, a local coordinate system for each keypoint is established by principal component analysis(PCA).Next, two local geometric features are extracted around each keypoint through the local coordinate system. Additionally, the features are augmented by the symmetrization according to the approximate left-right symmetry in human face. The proposed method is evaluated on the Bosphorus, BU-3DFE, and Gavab databases, respectively. Good results are achieved on these three datasets. As a result, the proposed method proves robust to facial expression variations, partial external occlusions and large pose changes.展开更多
In order to find better simplicity measurements for 3D object recognition, a new set of local regularities is developed and tested in a stepwise 3D reconstruction method, including localized minimizing standard deviat...In order to find better simplicity measurements for 3D object recognition, a new set of local regularities is developed and tested in a stepwise 3D reconstruction method, including localized minimizing standard deviation of angles(L-MSDA), localized minimizing standard deviation of segment magnitudes(L-MSDSM), localized minimum standard deviation of areas of child faces (L-MSDAF), localized minimum sum of segment magnitudes of common edges (L-MSSM), and localized minimum sum of areas of child face (L-MSAF). Based on their effectiveness measurements in terms of form and size distortions, it is found that when two local regularities: L-MSDA and L-MSDSM are combined together, they can produce better performance. In addition, the best weightings for them to work together are identified as 10% for L-MSDSM and 90% for L-MSDA. The test results show that the combined usage of L-MSDA and L-MSDSM with identified weightings has a potential to be applied in other optimization based 3D recognition methods to improve their efficacy and robustness.展开更多
Human motion recognition plays a crucial role in the video analysis framework.However,a given video may contain a variety of noises,such as an unstable background and redundant actions,that are completely different fr...Human motion recognition plays a crucial role in the video analysis framework.However,a given video may contain a variety of noises,such as an unstable background and redundant actions,that are completely different from the key actions.These noises pose a great challenge to human motion recognition.To solve this problem,we propose a new method based on the 3-Dimensional(3D)Bag of Visual Words(BoVW)framework.Our method includes two parts:The first part is the video action feature extractor,which can identify key actions by analyzing action features.In the video action encoder,by analyzing the action characteristics of a given video,we use the deep 3D CNN pre-trained model to obtain expressive coding information.A classifier with subnetwork nodes is used for the final classification.The extensive experiments demonstrate that our method leads to an impressive effect on complex video analysis.Our approach achieves state-of-the-art performance on the datasets of UCF101(85.3%)and HMDB51(54.5%).展开更多
This paper explores the possibility of using multi-core programming model that implements the Cascade correlation neural networks technique (CCNNs), to enhance the classification phase of 3D facial recognition system,...This paper explores the possibility of using multi-core programming model that implements the Cascade correlation neural networks technique (CCNNs), to enhance the classification phase of 3D facial recognition system, after extracting robust and distinguishable features. This research provides a comprehensive summary of the 3D facial recognition systems, as well as the state-of-the- art for the Parallel Cascade Correlation Neural Networks methods (PCCNNs). Moreover, it highlights the lack of literature that combined between distributed and shared memory model which leads to novel possibility of taking advantage of the strengths of both approaches in order to construct an efficient parallel computing system for 3D facial recognition.展开更多
In this Paper, a classification method based on neural networks is presented for recognition of 3D objects. Indeed, the objective of this paper is to classify an object query against objects in a database, which leads...In this Paper, a classification method based on neural networks is presented for recognition of 3D objects. Indeed, the objective of this paper is to classify an object query against objects in a database, which leads to recognition of the former. 3D objects of this database are transformations of other objects by one element of the overall transformation. The set of transformations considered in this work is the general affine group.展开更多
Background:The perception of visual forms is crucial for effective interactions with our environment and for the recognition of visual objects.Thus,to determine the codes underlying this function is a fundamental theo...Background:The perception of visual forms is crucial for effective interactions with our environment and for the recognition of visual objects.Thus,to determine the codes underlying this function is a fundamental theoretical objective in the study of the visual forms perception.The vast majority of research in the field is based on a hypothetico-deductive approach.Thus,we first begin by formulating a theory,then we make predictions and finally we conduct experimental tests.After decades of application of this approach,the field remains far from having a consensus as to the traits underlying the representation of visual form.Our goal is to determine,without theoretical a priori or any bias whatsoever,the information underlying the discrimination and recognition of 3D visual forms in normal human adults.Methods:To this end,the adaptive bubble technique developed by Wang et al.[2011]is applied on six 3D synthetic objects under varying views from one test to another.This technique is based on the presentation of stimuli that are partially revealed through Gaussian windows,the location of which is random and the number of which is established in such a way as to maintain an established performance criterion.Gradually,the experimental program uses participants’performance to determine the stimulus regions that participants use to recognize objects.The synthetic objects used in this study are unfamiliar and were generated from a program produced at C.Edward Connor’s lab,Johns Hopkins University School of Medicine.Results:The results were integrated across participants to establish regions of presented stimuli that determine the observers’ability to recognize them-i.e.,diagnostic attributes.The results will be reported in graphical form with a Z scores mapping that will be superimposed on silhouettes of the objects presented during the experiment.This mapping makes it possible to quantify the importance of the different regions on the visible surface of an object for its recognition by the participants.Conclusions:The diagnostic attributes that have been identified are the best described in terms of surface fragments.Some of these fragments are located on or near the outer edge of the stimulus while others are relatively distant.The overlap is minimal between the effective attributes for the different points of view of the same object.This suggests that the traits underlying the recognition of objects are specific to the point of view.In other words,they do not generalize through the points of view.展开更多
In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision lev...In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.展开更多
An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information r...An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.展开更多
Airborne LIDAR can flexibly obtain point cloud data with three-dimensional structural information,which can improve its effectiveness of automatic target recognition in the complex environment.Compared with 2D informa...Airborne LIDAR can flexibly obtain point cloud data with three-dimensional structural information,which can improve its effectiveness of automatic target recognition in the complex environment.Compared with 2D information,3D information performs better in separating objects and background.However,an aircraft platform can have a negative influence on LIDAR obtained data because of various flight attitudes,flight heights and atmospheric disturbances.A structure of global feature based 3D automatic target recognition method for airborne LIDAR is proposed,which is composed of offline phase and online phase.The performance of four global feature descriptors is compared.Considering the summed volume region(SVR) discrepancy in real objects,SVR selection is added into the pre-processing operations to eliminate mismatching clusters compared with the interested target.Highly reliable simulated data are obtained under various sensor’s altitudes,detection distances and atmospheric disturbances.The final experiments results show that the added step increases the recognition rate by above 2.4% and decreases the execution time by about 33%.展开更多
Two new recognition methods for the spatial planar POlygon using perspective invariants are presented. The corss-ratio (R c) of a vetex and the co-base area rotio (RA) of a edge in a spatial planar polygon are propose...Two new recognition methods for the spatial planar POlygon using perspective invariants are presented. The corss-ratio (R c) of a vetex and the co-base area rotio (RA) of a edge in a spatial planar polygon are proposed and used as the invariant primitive of the recognition eigenvector. The second distance error decision rule (SD EDR) estimating the relative error of RA is introduced also too. The mthods could recognize a spatial planar polygon with an arbitrary orientation through only a single perspective view. Experimental examples are gievn.展开更多
In many animal-related studies, a high-performance animal behavior recognition system can help researchers reduce or get rid of the limitation of human assessments and make the experiments easier to reproduce. Recentl...In many animal-related studies, a high-performance animal behavior recognition system can help researchers reduce or get rid of the limitation of human assessments and make the experiments easier to reproduce. Recently, although deep learning models are holding state-of-the-art performances in human action recognition tasks, these models are not well-studied in applying to animal behavior recognition tasks. One reason is the lack of extensive datasets which are required to train these deep models for good performances. In this research, we investigated two current state-of-the-art deep learning models in human action recognition tasks, the I3D model and the R(2 + 1)D model, in solving a mouse behavior recognition task. We compared their performances with other models from previous researches and the results showed that the deep learning models that pre-trained using human action datasets then fine-tuned using the mouse behavior dataset can outperform other models from previous researches. It also shows promises of applying these deep learning models to other animal behavior recognition tasks without any significant modification in the models’ architecture, all we need to do is collecting proper datasets for the tasks and fine-tuning the pre-trained models using the collected data.展开更多
基金the National Natural Science Foundation of China(Nos.62272063,62072056 and 61902041)the Natural Science Foundation of Hunan Province(Nos.2022JJ30617 and 2020JJ2029)+4 种基金Open Research Fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology,Nanjing University of Posts and Telecommunications(No.JZNY202102)the Traffic Science and Technology Project of Hunan Province,China(No.202042)Hunan Provincial Key Research and Development Program(No.2022GK2019)this work was funded by the Researchers Supporting Project Number(RSPD2023R681)King Saud University,Riyadh,Saudi Arabia.
文摘Internet of Vehicles (IoV) is a new system that enables individual vehicles to connect with nearby vehicles,people, transportation infrastructure, and networks, thereby realizing amore intelligent and efficient transportationsystem. The movement of vehicles and the three-dimensional (3D) nature of the road network cause the topologicalstructure of IoV to have the high space and time complexity.Network modeling and structure recognition for 3Droads can benefit the description of topological changes for IoV. This paper proposes a 3Dgeneral roadmodel basedon discrete points of roads obtained from GIS. First, the constraints imposed by 3D roads on moving vehicles areanalyzed. Then the effects of road curvature radius (Ra), longitudinal slope (Slo), and length (Len) on speed andacceleration are studied. Finally, a general 3D road network model based on road section features is established.This paper also presents intersection and road section recognition methods based on the structural features ofthe 3D road network model and the road features. Real GIS data from a specific region of Beijing is adopted tocreate the simulation scenario, and the simulation results validate the general 3D road network model and therecognitionmethod. Therefore, thiswork makes contributions to the field of intelligent transportation by providinga comprehensive approach tomodeling the 3Droad network and its topological changes in achieving efficient trafficflowand improved road safety.
文摘Appearance-based dynamic Hand Gesture Recognition(HGR)remains a prominent area of research in Human-Computer Interaction(HCI).Numerous environmental and computational constraints limit its real-time deployment.In addition,the performance of a model decreases as the subject’s distance from the camera increases.This study proposes a 3D separable Convolutional Neural Network(CNN),considering the model’s computa-tional complexity and recognition accuracy.The 20BN-Jester dataset was used to train the model for six gesture classes.After achieving the best offline recognition accuracy of 94.39%,the model was deployed in real-time while considering the subject’s attention,the instant of performing a gesture,and the subject’s distance from the camera.Despite being discussed in numerous research articles,the distance factor remains unresolved in real-time deployment,which leads to degraded recognition results.In the proposed approach,the distance calculation substantially improves the classification performance by reducing the impact of the subject’s distance from the camera.Additionally,the capability of feature extraction,degree of relevance,and statistical significance of the proposed model against other state-of-the-art models were validated using t-distributed Stochastic Neighbor Embedding(t-SNE),Mathew’s Correlation Coefficient(MCC),and the McNemar test,respectively.We observed that the proposed model exhibits state-of-the-art outcomes and a comparatively high significance level.
基金Supported by the National Natural Science Foundation of China (62202346)Hubei Key Research and Development Program (2021BAA042)+3 种基金Open project of Engineering Research Center of Hubei Province for Clothing Information (2022HBCI01)Wuhan Applied Basic Frontier Research Project (2022013988065212)MIIT′s AI Industry Innovation Task Unveils Flagship Projects (Key Technologies,Equipment,and Systems for Flexible Customized and Intelligent Manufacturing in the Clothing Industry)Hubei Science and Technology Project of Safe Production Special Fund (Scene Control Platform Based on Proprioception Information Computing of Artificial Intelligence)。
文摘Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables domain predominantly emphasizes sensor functionality and quantity,often skipping crucial aspects related to user experience and interaction.Methods To address this gap,this study introduces a novel real-time 3D interactive system based on intelligent garments.The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulsed neural units to classify and recognize human movements,thereby achieving real-time interaction between users and sensors.Additionally,the system incorporates 3D human visualization functionality,which visualizes sensor data and recognizes human actions as 3D models in real time,providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion.This system has significant potential for applications in motion detection,medical monitoring,virtual reality,and other fields.The accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies.Conclusions This study has substantial implications in the domains of intelligent garments,human motion monitoring,and digital twin visualization.The advancement of this system is expected to propel the progress of wearable technology and foster a deeper comprehension of human motion.
基金the National Natural Science Foundation of China(51909136)the Open Research Fund of Key Laboratory of Geological Hazards on Three Gorges Reservoir Area(China Three Gorges University),Ministry of Education,Grant No.2022KDZ21Fund of National Major Water Conservancy Project Construction(0001212022CC60001)。
文摘The staggered distribution of joints and fissures in space constitutes the weak part of any rock mass.The identification of rock mass structural planes and the extraction of characteristic parameters are the basis of rock-mass integrity evaluation,which is very important for analysis of slope stability.The laser scanning technique can be used to acquire the coordinate information pertaining to each point of the structural plane,but large amount of point cloud data,uneven density distribution,and noise point interference make the identification efficiency and accuracy of different types of structural planes limited by point cloud data analysis technology.A new point cloud identification and segmentation algorithm for rock mass structural surfaces is proposed.Based on the distribution states of the original point cloud in different neighborhoods in space,the point clouds are characterized by multi-dimensional eigenvalues and calculated by the robust randomized Hough transform(RRHT).The normal vector difference and the final eigenvalue are proposed for characteristic distinction,and the identification of rock mass structural surfaces is completed through regional growth,which strengthens the difference expression of point clouds.In addition,nearest Voxel downsampling is also introduced in the RRHT calculation,which further reduces the number of sources of neighborhood noises,thereby improving the accuracy and stability of the calculation.The advantages of the method have been verified by laboratory models.The results showed that the proposed method can better achieve the segmentation and statistics of structural planes with interfaces and sharp boundaries.The method works well in the identification of joints,fissures,and other structural planes on Mangshezhai slope in the Three Gorges Reservoir area,China.It can provide a stable and effective technique for the identification and segmentation of rock mass structural planes,which is beneficial in engineering practice.
基金supported by the National Natural Science Foundation of China(61773272,61976191)the Six Talent Peaks Project of Jiangsu Province,China(XYDXX-053)Suzhou Research Project of Technical Innovation,Jiangsu,China(SYG201711)。
文摘Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient.The representation of hand gestures is critical for recognition.In this paper,we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition.The depth maps of hand gestures captured via the Kinect sensors are used in our method,where the 3D hand shapes can be segmented from the cluttered backgrounds.To extract the pattern of salient 3D shape features,we propose a new descriptor-3D Shape Context,for 3D hand gesture representation.The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition.The description of all the 3D points constructs the hand gesture representation,and hand gesture recognition is explored via dynamic time warping algorithm.Extensive experiments are conducted on multiple benchmark datasets.The experimental results verify that the proposed method is robust to noise,articulated variations,and rigid transformations.Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
基金Supported by the Shaanxi Province Key Research and Development Project (No. 2021GY-280)Shaanxi Province Natural Science Basic Research Program (No. 2021JM-459)the National Natural Science Foundation of China (No. 61772417)
文摘Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.
文摘This paper presents a method for hand gesture recognition based on 3D point cloud. Digital image processing technology is used in this research. Based on the 3D point from depth camera, the system firstly extracts some raw data of the hand. After the data segmentation and preprocessing, three kinds of appearance features are extracted, including the number of stretched fingers, the angles between fingers and the gesture region’s area distribution feature. Based on these features, the system implements the identification of the gestures by using decision tree method. The results of experiment demonstrate that the proposed method is pretty efficient to recognize common gestures with a high accuracy.
基金Project(XDA06020300)supported by the"Strategic Priority Research Program"of the Chinese Academy of SciencesProject(12511501700)supported by the Research on the Key Technology of Internet of Things for Urban Community Safety Based on Video Sensor networks
文摘Expression, occlusion, and pose variations are three main challenges for 3D face recognition. A novel method is presented to address 3D face recognition using scale-invariant feature transform(SIFT) features on 3D meshes. After preprocessing, shape index extrema on the 3D facial surface are selected as keypoints in the difference scale space and the unstable keypoints are removed after two screening steps. Then, a local coordinate system for each keypoint is established by principal component analysis(PCA).Next, two local geometric features are extracted around each keypoint through the local coordinate system. Additionally, the features are augmented by the symmetrization according to the approximate left-right symmetry in human face. The proposed method is evaluated on the Bosphorus, BU-3DFE, and Gavab databases, respectively. Good results are achieved on these three datasets. As a result, the proposed method proves robust to facial expression variations, partial external occlusions and large pose changes.
文摘In order to find better simplicity measurements for 3D object recognition, a new set of local regularities is developed and tested in a stepwise 3D reconstruction method, including localized minimizing standard deviation of angles(L-MSDA), localized minimizing standard deviation of segment magnitudes(L-MSDSM), localized minimum standard deviation of areas of child faces (L-MSDAF), localized minimum sum of segment magnitudes of common edges (L-MSSM), and localized minimum sum of areas of child face (L-MSAF). Based on their effectiveness measurements in terms of form and size distortions, it is found that when two local regularities: L-MSDA and L-MSDSM are combined together, they can produce better performance. In addition, the best weightings for them to work together are identified as 10% for L-MSDSM and 90% for L-MSDA. The test results show that the combined usage of L-MSDA and L-MSDSM with identified weightings has a potential to be applied in other optimization based 3D recognition methods to improve their efficacy and robustness.
文摘Human motion recognition plays a crucial role in the video analysis framework.However,a given video may contain a variety of noises,such as an unstable background and redundant actions,that are completely different from the key actions.These noises pose a great challenge to human motion recognition.To solve this problem,we propose a new method based on the 3-Dimensional(3D)Bag of Visual Words(BoVW)framework.Our method includes two parts:The first part is the video action feature extractor,which can identify key actions by analyzing action features.In the video action encoder,by analyzing the action characteristics of a given video,we use the deep 3D CNN pre-trained model to obtain expressive coding information.A classifier with subnetwork nodes is used for the final classification.The extensive experiments demonstrate that our method leads to an impressive effect on complex video analysis.Our approach achieves state-of-the-art performance on the datasets of UCF101(85.3%)and HMDB51(54.5%).
文摘This paper explores the possibility of using multi-core programming model that implements the Cascade correlation neural networks technique (CCNNs), to enhance the classification phase of 3D facial recognition system, after extracting robust and distinguishable features. This research provides a comprehensive summary of the 3D facial recognition systems, as well as the state-of-the- art for the Parallel Cascade Correlation Neural Networks methods (PCCNNs). Moreover, it highlights the lack of literature that combined between distributed and shared memory model which leads to novel possibility of taking advantage of the strengths of both approaches in order to construct an efficient parallel computing system for 3D facial recognition.
文摘In this Paper, a classification method based on neural networks is presented for recognition of 3D objects. Indeed, the objective of this paper is to classify an object query against objects in a database, which leads to recognition of the former. 3D objects of this database are transformations of other objects by one element of the overall transformation. The set of transformations considered in this work is the general affine group.
文摘Background:The perception of visual forms is crucial for effective interactions with our environment and for the recognition of visual objects.Thus,to determine the codes underlying this function is a fundamental theoretical objective in the study of the visual forms perception.The vast majority of research in the field is based on a hypothetico-deductive approach.Thus,we first begin by formulating a theory,then we make predictions and finally we conduct experimental tests.After decades of application of this approach,the field remains far from having a consensus as to the traits underlying the representation of visual form.Our goal is to determine,without theoretical a priori or any bias whatsoever,the information underlying the discrimination and recognition of 3D visual forms in normal human adults.Methods:To this end,the adaptive bubble technique developed by Wang et al.[2011]is applied on six 3D synthetic objects under varying views from one test to another.This technique is based on the presentation of stimuli that are partially revealed through Gaussian windows,the location of which is random and the number of which is established in such a way as to maintain an established performance criterion.Gradually,the experimental program uses participants’performance to determine the stimulus regions that participants use to recognize objects.The synthetic objects used in this study are unfamiliar and were generated from a program produced at C.Edward Connor’s lab,Johns Hopkins University School of Medicine.Results:The results were integrated across participants to establish regions of presented stimuli that determine the observers’ability to recognize them-i.e.,diagnostic attributes.The results will be reported in graphical form with a Z scores mapping that will be superimposed on silhouettes of the objects presented during the experiment.This mapping makes it possible to quantify the importance of the different regions on the visible surface of an object for its recognition by the participants.Conclusions:The diagnostic attributes that have been identified are the best described in terms of surface fragments.Some of these fragments are located on or near the outer edge of the stimulus while others are relatively distant.The overlap is minimal between the effective attributes for the different points of view of the same object.This suggests that the traits underlying the recognition of objects are specific to the point of view.In other words,they do not generalize through the points of view.
基金supported by the National Natural Science Foundation of China under Grant No. 61503424the Research Project by The State Ethnic Affairs Commission under Grant No. 14ZYZ017+2 种基金the Jiangsu Future Networks Innovation Institute-Prospective Research Project on Future Networks under Grant No. BY2013095-2-14the Fundamental Research Funds for the Central Universities No. FRF-TP-14-046A2the first-class discipline construction transitional funds of Minzu University of China
文摘In order to take advantage of the logical structure of video sequences and improve the recognition accuracy of the human action, a novel hybrid human action detection method based on three descriptors and decision level fusion is proposed. Firstly, the minimal 3D space region of human action region is detected by combining frame difference method and Vi BE algorithm, and the three-dimensional histogram of oriented gradient(HOG3D) is extracted. At the same time, the characteristics of global descriptors based on frequency domain filtering(FDF) and the local descriptors based on spatial-temporal interest points(STIP) are extracted. Principal component analysis(PCA) is implemented to reduce the dimension of the gradient histogram and the global descriptor, and bag of words(BoW) model is applied to describe the local descriptors based on STIP. Finally, a linear support vector machine(SVM) is used to create a new decision level fusion classifier. Some experiments are done to verify the performance of the multi-features, and the results show that they have good representation ability and generalization ability. Otherwise, the proposed scheme obtains very competitive results on the well-known datasets in terms of mean average precision.
基金supported by the General Program of the National Natural Science Foundation of China (62272234)the Enterprise Cooperation Project (2022h160)the Priority Academic Program Development of Jiangsu Higher Education Institutions Project.
文摘An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.
基金This research was supported by National Natural Science Foundation of China(No.61271353,61871389)Major Funding Projects of National University of Defense Technology(No.ZK18-01-02)Foundation of State Key Laboratory of Pulsed Power Laser Technology(No.SKL2018ZR09).
文摘Airborne LIDAR can flexibly obtain point cloud data with three-dimensional structural information,which can improve its effectiveness of automatic target recognition in the complex environment.Compared with 2D information,3D information performs better in separating objects and background.However,an aircraft platform can have a negative influence on LIDAR obtained data because of various flight attitudes,flight heights and atmospheric disturbances.A structure of global feature based 3D automatic target recognition method for airborne LIDAR is proposed,which is composed of offline phase and online phase.The performance of four global feature descriptors is compared.Considering the summed volume region(SVR) discrepancy in real objects,SVR selection is added into the pre-processing operations to eliminate mismatching clusters compared with the interested target.Highly reliable simulated data are obtained under various sensor’s altitudes,detection distances and atmospheric disturbances.The final experiments results show that the added step increases the recognition rate by above 2.4% and decreases the execution time by about 33%.
文摘Two new recognition methods for the spatial planar POlygon using perspective invariants are presented. The corss-ratio (R c) of a vetex and the co-base area rotio (RA) of a edge in a spatial planar polygon are proposed and used as the invariant primitive of the recognition eigenvector. The second distance error decision rule (SD EDR) estimating the relative error of RA is introduced also too. The mthods could recognize a spatial planar polygon with an arbitrary orientation through only a single perspective view. Experimental examples are gievn.
文摘In many animal-related studies, a high-performance animal behavior recognition system can help researchers reduce or get rid of the limitation of human assessments and make the experiments easier to reproduce. Recently, although deep learning models are holding state-of-the-art performances in human action recognition tasks, these models are not well-studied in applying to animal behavior recognition tasks. One reason is the lack of extensive datasets which are required to train these deep models for good performances. In this research, we investigated two current state-of-the-art deep learning models in human action recognition tasks, the I3D model and the R(2 + 1)D model, in solving a mouse behavior recognition task. We compared their performances with other models from previous researches and the results showed that the deep learning models that pre-trained using human action datasets then fine-tuned using the mouse behavior dataset can outperform other models from previous researches. It also shows promises of applying these deep learning models to other animal behavior recognition tasks without any significant modification in the models’ architecture, all we need to do is collecting proper datasets for the tasks and fine-tuning the pre-trained models using the collected data.