Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Ou...Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.展开更多
<div style="text-align:justify;"> Load identification method is one of the major technical difficulties of non-intrusive composite monitoring. Binary V-I trajectory image can reflect the original V-I t...<div style="text-align:justify;"> Load identification method is one of the major technical difficulties of non-intrusive composite monitoring. Binary V-I trajectory image can reflect the original V-I trajectory characteristics to a large extent, so it is widely used in load identification. However, using single binary V-I trajectory feature for load identification has certain limitations. In order to improve the accuracy of load identification, the power feature is added on the basis of the binary V-I trajectory feature in this paper. We change the initial binary V-I trajectory into a new 3D feature by mapping the power feature to the third dimension. In order to reduce the impact of imbalance samples on load identification, the SVM SMOTE algorithm is used to balance the samples. Based on the deep learning method, the convolutional neural network model is used to extract the newly produced 3D feature to achieve load identification in this paper. The results indicate the new 3D feature has better observability and the proposed model has higher identification performance compared with other classification models on the public data set PLAID. </div>展开更多
In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for...In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for a fully convolutional neural networkmodel. This model is used to reconstruct the three-dimensional (3D) digital core of Bereasandstone based on a small number of CT images. The Hamming distance together with theMinkowski functions for porosity, average volume specifi c surface area, average curvature,and connectivity of both the real core and the digital reconstruction are used to evaluate theaccuracy of the proposed method. The results show that the reconstruction achieved relativeerrors of 6.26%, 1.40%, 6.06%, and 4.91% for the four Minkowski functions and a Hammingdistance of 0.04479. This demonstrates that the proposed method can not only reconstructthe physical properties of real sandstone but can also restore the real characteristics of poredistribution in sandstone, is the ability to which is a new way to characterize the internalmicrostructure of rocks.展开更多
The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dim...The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.展开更多
Image based individual dairy cattle recognition has gained much attention recently. In order to further improve the accuracy of individual dairy cattle recognition, an algorithm based on deep convolutional neural netw...Image based individual dairy cattle recognition has gained much attention recently. In order to further improve the accuracy of individual dairy cattle recognition, an algorithm based on deep convolutional neural network( DCNN) is proposed in this paper,which enables automatic feature extraction and classification that outperforms traditional hand craft features. Through making multigroup comparison experiments including different network layers,different sizes of convolution kernel and different feature dimensions in full connection layer,we demonstrate that the proposed method is suitable for dairy cattle classification. The experimental results show that the accuracy is significantly higher compared to two traditional image processing algorithms: scale invariant feature transform( SIFT) algorithm and bag of feature( BOF) model.展开更多
In this study,we examined the efficacy of a deep convolutional neural network(DCNN)in recognizing concrete surface images and predicting the compressive strength of concrete.A digital single-lens reflex(DSLR)camera an...In this study,we examined the efficacy of a deep convolutional neural network(DCNN)in recognizing concrete surface images and predicting the compressive strength of concrete.A digital single-lens reflex(DSLR)camera and microscope were simultaneously used to obtain concrete surface images used as the input data for the DCNN.Thereafter,training,validation,and testing of the DCNNs were performed based on the DSLR camera and microscope image data.Results of the analysis indicated that the DCNN employing DSLR image data achieved a relatively higher accuracy.The accuracy of the DSLR-derived image data was attributed to the relatively wider range of the DSLR camera,which was beneficial for extracting a larger number of features.Moreover,the DSLR camera procured more realistic images than the microscope.Thus,when the compressive strength of concrete was evaluated using the DCNN employing a DSLR camera,time and cost were reduced,whereas the usefulness increased.Furthermore,an indirect comparison of the accuracy of the DCNN with that of existing non-destructive methods for evaluating the strength of concrete proved the reliability of DCNN-derived concrete strength predictions.In addition,it was determined that the DCNN used for concrete strength evaluations in this study can be further expanded to detect and evaluate various deteriorative factors that affect the durability of structures,such as salt damage,carbonation,sulfation,corrosion,and freezing-thawing.展开更多
In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D...In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D CNN model is composed of the feature extraction block and regression block.The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape,and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data.The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds,where these glottal shapes are synthesized based on the equations of normal vibration modes.The output flow data is the corresponding flow rate,averaged glottal pressure and nodal pressure distributions over the glottal surface.The 3D CNN model is built to establish the mapping between the input image data and output flow data.The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver.The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set.The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics(CFD)models while the accuracy can still be retained,and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided.The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.展开更多
Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shap...Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.展开更多
Background Deep convolutional neural networks have garnered considerable attention in numerous machine learning applications,particularly in visual recognition tasks such as image and video analyses.There is a growing...Background Deep convolutional neural networks have garnered considerable attention in numerous machine learning applications,particularly in visual recognition tasks such as image and video analyses.There is a growing interest in applying this technology to diverse applications in medical image analysis.Automated three dimensional Breast Ultrasound is a vital tool for detecting breast cancer,and computer-assisted diagnosis software,developed based on deep learning,can effectively assist radiologists in diagnosis.However,the network model is prone to overfitting during training,owing to challenges such as insufficient training data.This study attempts to solve the problem caused by small datasets and improve model detection performance.Methods We propose a breast cancer detection framework based on deep learning(a transfer learning method based on cross-organ cancer detection)and a contrastive learning method based on breast imaging reporting and data systems(BI-RADS).Results When using cross organ transfer learning and BIRADS based contrastive learning,the average sensitivity of the model increased by a maximum of 16.05%.Conclusion Our experiments have demonstrated that the parameters and experiences of cross-organ cancer detection can be mutually referenced,and contrastive learning method based on BI-RADS can improve the detection performance of the model.展开更多
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India...Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.展开更多
Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbase...Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.展开更多
In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object ...In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object recognition.In this paper,we propose to use the principal curvature directions of 3D objects(using a CAD model)to represent the geometric features as inputs for the 3D CNN.Our framework,namely CurveNet,learns perceptually relevant salient features and predicts object class labels.Curvature directions incorporate complex surface information of a 3D object,which helps our framework to produce more precise and discriminative features for object recognition.Multitask learning is inspired by sharing features between two related tasks,where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification.Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification.We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs.A Cross-Stitch module was adopted to learn effective shared features across multiple representations.We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.展开更多
Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the...Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.展开更多
Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a bi...Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.展开更多
While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of ...While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of all ages.While a difficult task,detecting pornography can be the important step in determining the porn and adult content in a video.In this paper,an architecture is proposed which yielded high scores for both training and testing.This dataset was produced from 190 videos,yielding more than 19 h of videos.The main sources for the content were from YouTube,movies,torrent,and websites that hosts both pornographic and non-pornographic contents.The videos were from different ethnicities and skin color which ensures the models can detect any kind of video.A VGG16,Inception V3 and Resnet 50 models were initially trained to detect these pornographic images but failed to achieve a high testing accuracy with accuracies of 0.49,0.49 and 0.78 respectively.Finally,utilizing transfer learning,a convolutional neural network was designed and yielded an accuracy of 0.98.展开更多
An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information r...An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.展开更多
文摘Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.
文摘<div style="text-align:justify;"> Load identification method is one of the major technical difficulties of non-intrusive composite monitoring. Binary V-I trajectory image can reflect the original V-I trajectory characteristics to a large extent, so it is widely used in load identification. However, using single binary V-I trajectory feature for load identification has certain limitations. In order to improve the accuracy of load identification, the power feature is added on the basis of the binary V-I trajectory feature in this paper. We change the initial binary V-I trajectory into a new 3D feature by mapping the power feature to the third dimension. In order to reduce the impact of imbalance samples on load identification, the SVM SMOTE algorithm is used to balance the samples. Based on the deep learning method, the convolutional neural network model is used to extract the newly produced 3D feature to achieve load identification in this paper. The results indicate the new 3D feature has better observability and the proposed model has higher identification performance compared with other classification models on the public data set PLAID. </div>
基金the National Natural Science Foundation of China(No.41274129)Chuan Qing Drilling Engineering Company's Scientific Research Project:Seismic detection technology and application of complex carbonate reservoir in Sulige Majiagou Formation and the 2018 Central Supporting Local Co-construction Fund(No.80000-18Z0140504)the Construction and Development of Universities in 2019-Joint Support for Geophysics(Double First-Class center,80000-19Z0204)。
文摘In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for a fully convolutional neural networkmodel. This model is used to reconstruct the three-dimensional (3D) digital core of Bereasandstone based on a small number of CT images. The Hamming distance together with theMinkowski functions for porosity, average volume specifi c surface area, average curvature,and connectivity of both the real core and the digital reconstruction are used to evaluate theaccuracy of the proposed method. The results show that the reconstruction achieved relativeerrors of 6.26%, 1.40%, 6.06%, and 4.91% for the four Minkowski functions and a Hammingdistance of 0.04479. This demonstrates that the proposed method can not only reconstructthe physical properties of real sandstone but can also restore the real characteristics of poredistribution in sandstone, is the ability to which is a new way to characterize the internalmicrostructure of rocks.
基金Supported by the Shaanxi Province Key Research and Development Project(No.2021GY-280)Shaanxi Province Natural Science Basic Re-search Program Project(No.2021JM-459)+1 种基金the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)the Shaanxi Province International Science and Technology Cooperation Project(No.2018KW-006).
文摘The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.
基金Science and Technology Support Plan Project of Tianjin Municipal Science and Technology Commission(No.15ZCZDNC00130)
文摘Image based individual dairy cattle recognition has gained much attention recently. In order to further improve the accuracy of individual dairy cattle recognition, an algorithm based on deep convolutional neural network( DCNN) is proposed in this paper,which enables automatic feature extraction and classification that outperforms traditional hand craft features. Through making multigroup comparison experiments including different network layers,different sizes of convolution kernel and different feature dimensions in full connection layer,we demonstrate that the proposed method is suitable for dairy cattle classification. The experimental results show that the accuracy is significantly higher compared to two traditional image processing algorithms: scale invariant feature transform( SIFT) algorithm and bag of feature( BOF) model.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(NRF-2018R1A2B6007333)This study was supported by 2018 Research Grant from Kangwon National University.
文摘In this study,we examined the efficacy of a deep convolutional neural network(DCNN)in recognizing concrete surface images and predicting the compressive strength of concrete.A digital single-lens reflex(DSLR)camera and microscope were simultaneously used to obtain concrete surface images used as the input data for the DCNN.Thereafter,training,validation,and testing of the DCNNs were performed based on the DSLR camera and microscope image data.Results of the analysis indicated that the DCNN employing DSLR image data achieved a relatively higher accuracy.The accuracy of the DSLR-derived image data was attributed to the relatively wider range of the DSLR camera,which was beneficial for extracting a larger number of features.Moreover,the DSLR camera procured more realistic images than the microscope.Thus,when the compressive strength of concrete was evaluated using the DCNN employing a DSLR camera,time and cost were reduced,whereas the usefulness increased.Furthermore,an indirect comparison of the accuracy of the DCNN with that of existing non-destructive methods for evaluating the strength of concrete proved the reliability of DCNN-derived concrete strength predictions.In addition,it was determined that the DCNN used for concrete strength evaluations in this study can be further expanded to detect and evaluate various deteriorative factors that affect the durability of structures,such as salt damage,carbonation,sulfation,corrosion,and freezing-thawing.
基金supported by the Open Project of Key Laboratory of Computational Aerodynamics,AVIC Aerodynamics Research Institute(Grant No.YL2022XFX0409).
文摘In this work,a three dimensional(3D)convolutional neural network(CNN)model based on image slices of various normal and pathological vocal folds is proposed for accurate and efficient prediction of glottal flows.The 3D CNN model is composed of the feature extraction block and regression block.The feature extraction block is capable of learning low dimensional features from the high dimensional image data of the glottal shape,and the regression block is employed to flatten the output from the feature extraction block and obtain the desired glottal flow data.The input image data is the condensed set of 2D image slices captured in the axial plane of the 3D vocal folds,where these glottal shapes are synthesized based on the equations of normal vibration modes.The output flow data is the corresponding flow rate,averaged glottal pressure and nodal pressure distributions over the glottal surface.The 3D CNN model is built to establish the mapping between the input image data and output flow data.The ground-truth flow variables of each glottal shape in the training and test datasets are obtained by a high-fidelity sharp-interface immersed-boundary solver.The proposed model is trained to predict the concerned flow variables for glottal shapes in the test set.The present 3D CNN model is more efficient than traditional Computational Fluid Dynamics(CFD)models while the accuracy can still be retained,and more powerful than previous data-driven prediction models because more details of the glottal flow can be provided.The prediction performance of the trained 3D CNN model in accuracy and efficiency indicates that this model could be promising for future clinical applications.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFE0206900the National Natural Science Foundation of China under Grant No.61871440 and CAAI‐Huawei Mind-Spore Open Fund.
文摘Tumour segmentation in medical images(especially 3D tumour segmentation)is highly challenging due to the possible similarity between tumours and adjacent tissues,occurrence of multiple tumours and variable tumour shapes and sizes.The popular deep learning‐based segmentation algorithms generally rely on the convolutional neural network(CNN)and Transformer.The former cannot extract the global image features effectively while the latter lacks the inductive bias and involves the complicated computation for 3D volume data.The existing hybrid CNN‐Transformer network can only provide the limited performance improvement or even poorer segmentation performance than the pure CNN.To address these issues,a short‐term and long‐term memory self‐attention network is proposed.Firstly,a distinctive self‐attention block uses the Transformer to explore the correlation among the region features at different levels extracted by the CNN.Then,the memory structure filters and combines the above information to exclude the similar regions and detect the multiple tumours.Finally,the multi‐layer reconstruction blocks will predict the tumour boundaries.Experimental results demonstrate that our method outperforms other methods in terms of subjective visual and quantitative evaluation.Compared with the most competitive method,the proposed method provides Dice(82.4%vs.76.6%)and Hausdorff distance 95%(HD95)(10.66 vs.11.54 mm)on the KiTS19 as well as Dice(80.2%vs.78.4%)and HD95(9.632 vs.12.17 mm)on the LiTS.
基金Macao Polytechnic University Grant(RP/FCSD-01/2022RP/FCA-05/2022)Science and Technology Development Fund of Macao(0105/2022/A).
文摘Background Deep convolutional neural networks have garnered considerable attention in numerous machine learning applications,particularly in visual recognition tasks such as image and video analyses.There is a growing interest in applying this technology to diverse applications in medical image analysis.Automated three dimensional Breast Ultrasound is a vital tool for detecting breast cancer,and computer-assisted diagnosis software,developed based on deep learning,can effectively assist radiologists in diagnosis.However,the network model is prone to overfitting during training,owing to challenges such as insufficient training data.This study attempts to solve the problem caused by small datasets and improve model detection performance.Methods We propose a breast cancer detection framework based on deep learning(a transfer learning method based on cross-organ cancer detection)and a contrastive learning method based on breast imaging reporting and data systems(BI-RADS).Results When using cross organ transfer learning and BIRADS based contrastive learning,the average sensitivity of the model increased by a maximum of 16.05%.Conclusion Our experiments have demonstrated that the parameters and experiences of cross-organ cancer detection can be mutually referenced,and contrastive learning method based on BI-RADS can improve the detection performance of the model.
文摘Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.
文摘Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream.Many researchers have been working on visionbased gesture recognition due to its various applications.This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network(3D-CNN)and a Long Short-Term Memory(LSTM)network.The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation.The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out.The proposed model is a light-weight architecture with only 3.7 million training parameters.The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly.The model was trained on 2000 video-clips per class which were separated into 80%training and 20%validation sets.An accuracy of 99%and 97%was achieved on training and testing data,respectively.We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2+LSTM.
基金This paper was partially supported by a project of the Shanghai Science and Technology Committee(18510760300)Anhui Natural Science Foundation(1908085MF178)Anhui Excellent Young Talents Support Program Project(gxyqZD2019069).
文摘In computer vision fields,3D object recognition is one of the most important tasks for many real-world applications.Three-dimensional convolutional neural networks(CNNs)have demonstrated their advantages in 3D object recognition.In this paper,we propose to use the principal curvature directions of 3D objects(using a CAD model)to represent the geometric features as inputs for the 3D CNN.Our framework,namely CurveNet,learns perceptually relevant salient features and predicts object class labels.Curvature directions incorporate complex surface information of a 3D object,which helps our framework to produce more precise and discriminative features for object recognition.Multitask learning is inspired by sharing features between two related tasks,where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification.Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification.We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs.A Cross-Stitch module was adopted to learn effective shared features across multiple representations.We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.
基金Supported by the Shaanxi Province Key Research and Development Project (No. 2021GY-280)Shaanxi Province Natural Science Basic Research Program (No. 2021JM-459)the National Natural Science Foundation of China (No. 61772417)
文摘Because behavior recognition is based on video frame sequences,this paper proposes a behavior recognition algorithm that combines 3D residual convolutional neural network(R3D)and long short-term memory(LSTM).First,the residual module is extended to three dimensions,which can extract features in the time and space domain at the same time.Second,by changing the size of the pooling layer window the integrity of the time domain features is preserved,at the same time,in order to overcome the difficulty of network training and over-fitting problems,the batch normalization(BN)layer and the dropout layer are added.After that,because the global average pooling layer(GAP)is affected by the size of the feature map,the network cannot be further deepened,so the convolution layer and maxpool layer are added to the R3D network.Finally,because LSTM has the ability to memorize information and can extract more abstract timing features,the LSTM network is introduced into the R3D network.Experimental results show that the R3D+LSTM network achieves 91%recognition rate on the UCF-101 dataset.
基金supported by the National Natural Science Foundation of China(Nos.61862058,61962034,and 8226070356)in part by the Gansu Provincial Science&Technology Department(No.20JR10RA076)。
文摘Depression has become a major health threat around the world,especially for older people,so the effective detection method for depression is a great public health challenge.Electroencephalogram(EEG)can be used as a biomarker to effectively explore depression recognition.Motivated by the studies that multiple smaller scale kernels could increase nonlinear expression compared to a larger kernel,this article proposes a model named the three-dimensional multiscale kernels convolutional neural network model for the depression disorder recognition(3DMKDR),which is a three-dimensional convolutional neural network model with multiscale convolutional kernels for depression recognition based on EEG signals.A three-dimensional structure of the EEG is built by extending one-dimensional feature sequences into a two-dimensional electrode matrix to excavate the related spatiotemporal information among electrodes and the collected electrode matrix.By the major depressive disorder(MDD)and the multi-modal open dataset for mental-disorder analysis(MODMA)datasets,the experiment shows that the accuracies of depression recognition are up to99.86%and 98.01%in the subject-dependent experiment,and 95.80%and 82.27%in the subjectindependent experiment,which are higher than alternative competitive methods.The experimental results demonstrate that the proposed 3DMKDR is potentially useful for depression recognition in older persons in the future.
文摘While the internet has a lot of positive impact on society,there are negative components.Accessible to everyone through online platforms,pornography is,inducing psychological and health related issues among people of all ages.While a difficult task,detecting pornography can be the important step in determining the porn and adult content in a video.In this paper,an architecture is proposed which yielded high scores for both training and testing.This dataset was produced from 190 videos,yielding more than 19 h of videos.The main sources for the content were from YouTube,movies,torrent,and websites that hosts both pornographic and non-pornographic contents.The videos were from different ethnicities and skin color which ensures the models can detect any kind of video.A VGG16,Inception V3 and Resnet 50 models were initially trained to detect these pornographic images but failed to achieve a high testing accuracy with accuracies of 0.49,0.49 and 0.78 respectively.Finally,utilizing transfer learning,a convolutional neural network was designed and yielded an accuracy of 0.98.
基金supported by the General Program of the National Natural Science Foundation of China (62272234)the Enterprise Cooperation Project (2022h160)the Priority Academic Program Development of Jiangsu Higher Education Institutions Project.
文摘An action recognition network that combines multi-level spatiotemporal feature fusion with an attention mechanism is proposed as a solution to the issues of single spatiotemporal feature scale extraction,information redundancy,and insufficient extraction of frequency domain information in channels in 3D convolutional neural networks.Firstly,based on 3D CNN,this paper designs a new multilevel spatiotemporal feature fusion(MSF)structure,which is embedded in the network model,mainly through multilevel spatiotemporal feature separation,splicing and fusion,to achieve the fusion of spatial perceptual fields and short-medium-long time series information at different scales with reduced network parameters;In the second step,a multi-frequency channel and spatiotemporal attention module(FSAM)is introduced to assign different frequency features and spatiotemporal features in the channels are assigned corresponding weights to reduce the information redundancy of the feature maps.Finally,we embed the proposed method into the R3D model,which replaced the 2D convolutional filters in the 2D Resnet with 3D convolutional filters and conduct extensive experimental validation on the small and medium-sized dataset UCF101 and the largesized dataset Kinetics-400.The findings revealed that our model increased the recognition accuracy on both datasets.Results on the UCF101 dataset,in particular,demonstrate that our model outperforms R3D in terms of a maximum recognition accuracy improvement of 7.2%while using 34.2%fewer parameters.The MSF and FSAM are migrated to another traditional 3D action recognition model named C3D for application testing.The test results based on UCF101 show that the recognition accuracy is improved by 8.9%,proving the strong generalization ability and universality of the method in this paper.