Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Ou...Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and...In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.展开更多
Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label c...Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label convolutional neural network( MSMLCNN) is proposed to predict multiple pedestrian attributes simultaneously. The pedestrian attribute classification problem is firstly transformed into a multi-label problem including multiple binary attributes needed to be classified. Then,the multi-label problem is solved by fully connecting all binary attributes to multi-scale features with logistic regression functions. Moreover,the multi-scale features are obtained by concatenating those featured maps produced from multiple pooling layers of the MSMLCNN at different scales. Extensive experiment results show that the proposed MSMLCNN outperforms state-of-the-art pedestrian attribute classification methods with a large margin.展开更多
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India...Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.展开更多
As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symboli...As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry.In this paper,a multi-scale Convolutional Gated Recurrent Unit network(MCGRU)is proposed to address raw sensory data for tool wear prediction.At the bottom of MCGRU,six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network,which augments the adaptability to features of different time scales.These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations.At the top of the MCGRU,a fully connected layer and a regression layer are built for cutting tool wear prediction.Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.展开更多
In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for...In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for a fully convolutional neural networkmodel. This model is used to reconstruct the three-dimensional (3D) digital core of Bereasandstone based on a small number of CT images. The Hamming distance together with theMinkowski functions for porosity, average volume specifi c surface area, average curvature,and connectivity of both the real core and the digital reconstruction are used to evaluate theaccuracy of the proposed method. The results show that the reconstruction achieved relativeerrors of 6.26%, 1.40%, 6.06%, and 4.91% for the four Minkowski functions and a Hammingdistance of 0.04479. This demonstrates that the proposed method can not only reconstructthe physical properties of real sandstone but can also restore the real characteristics of poredistribution in sandstone, is the ability to which is a new way to characterize the internalmicrostructure of rocks.展开更多
Esophageal disease is a common disorder of the digestive system that can severely affect the quality of life andprognosis of patients. Esophageal stenting is an effective treatment that has been widely used in clinica...Esophageal disease is a common disorder of the digestive system that can severely affect the quality of life andprognosis of patients. Esophageal stenting is an effective treatment that has been widely used in clinical practice.However, esophageal stents of different types and parameters have varying adaptability and effectiveness forpatients, and they need to be individually selected according to the patient’s specific situation. The purposeof this study was to provide a reference for clinical doctors to choose suitable esophageal stents. We used 3Dprinting technology to fabricate esophageal stents with different ratios of thermoplastic polyurethane (TPU)/(Poly-ε-caprolactone) PCL polymer, and established an artificial neural network model that could predict the radial forceof esophageal stents based on the content of TPU, PCL and print parameter. We selected three optimal ratios formechanical performance tests and evaluated the biomechanical effects of different ratios of stents on esophagealimplantation, swallowing, and stent migration processes through finite element numerical simulation and in vitrosimulation tests. The results showed that different ratios of polymer stents had different mechanical properties,affecting the effectiveness of stent expansion treatment and the possibility of postoperative complications of stentimplantation.展开更多
The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dim...The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.展开更多
When checking the ice shape calculation software,its accuracy is judged based on the proximity between the calculated ice shape and the typical test ice shape.Therefore,determining the typical test ice shape becomes t...When checking the ice shape calculation software,its accuracy is judged based on the proximity between the calculated ice shape and the typical test ice shape.Therefore,determining the typical test ice shape becomes the key task of the icing wind tunnel tests.In the icing wind tunnel test of the tail wing model of a large amphibious aircraft,in order to obtain accurate typical test ice shape,the Romer Absolute Scanner is used to obtain the 3D point cloud data of the ice shape on the tail wing model.Then,the batch-learning self-organizing map(BLSOM)neural network is used to obtain the 2D average ice shape along the model direction based on the 3D point cloud data of the ice shape,while its tolerance band is calculated using the probabilistic statistical method.The results show that the combination of 2D average ice shape and its tolerance band can represent the 3D characteristics of the test ice shape effectively,which can be used as the typical test ice shape for comparative analysis with the calculated ice shape.展开更多
AIM: To explore a segmentation algorithm based on deep learning to achieve accurate diagnosis and treatment of patients with retinal fluid.METHODS: A two-dimensional(2D) fully convolutional network for retinal segment...AIM: To explore a segmentation algorithm based on deep learning to achieve accurate diagnosis and treatment of patients with retinal fluid.METHODS: A two-dimensional(2D) fully convolutional network for retinal segmentation was employed. In order to solve the category imbalance in retinal optical coherence tomography(OCT) images, the network parameters and loss function based on the 2D fully convolutional network were modified. For this network, the correlations of corresponding positions among adjacent images in space are ignored. Thus, we proposed a three-dimensional(3D) fully convolutional network for segmentation in the retinal OCT images.RESULTS: The algorithm was evaluated according to segmentation accuracy, Kappa coefficient, and F1 score. For the 3D fully convolutional network proposed in this paper, the overall segmentation accuracy rate is 99.56%, Kappa coefficient is 98.47%, and F1 score of retinal fluid is 95.50%. CONCLUSION: The OCT image segmentation algorithm based on deep learning is primarily founded on the 2D convolutional network. The 3D network architecture proposed in this paper reduces the influence of category imbalance, realizes end-to-end segmentation of volume images, and achieves optimal segmentation results. The segmentation maps are practically the same as the manual annotations of doctors, and can provide doctors with more accurate diagnostic data.展开更多
The background pattern of patterned fabrics is complex,which has a great interference in the extraction of defect features.Traditional machine vision algorithms rely on artificially designed features,which are greatly...The background pattern of patterned fabrics is complex,which has a great interference in the extraction of defect features.Traditional machine vision algorithms rely on artificially designed features,which are greatly affected by background patterns and are difficult to effectively extract flaw features.Therefore,a convolutional neural network(CNN)with automatic feature extraction is proposed.On the basis of the two-stage detection model Faster R-CNN,Resnet-50 is used as the backbone network,and the problem of flaws with extreme aspect ratio is solved by improving the initialization algorithm of the prior frame aspect ratio,and the improved multi-scale model is designed to improve detection of small defects.The cascade R-CNN is introduced to improve the accuracy of defect detection,and the online hard example mining(OHEM)algorithm is used to strengthen the learning of hard samples to reduce the interference of complex backgrounds on the defect detection of patterned fabrics,and construct the focal loss as a loss function to reduce the impact of sample imbalance.In order to verify the effectiveness of the improved algorithm,a defect detection comparison experiment was set up.The experimental results show that the accuracy of the defect detection algorithm of patterned fabrics in this paper can reach 95.7%,and it can accurately locate the defect location and meet the actual needs of the factory.展开更多
Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale regi...Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.展开更多
As neural radiance fields continue to advance in 3D content representation,the copyright issues surrounding 3D models oriented towards implicit representation become increasingly pressing.In response to this challenge...As neural radiance fields continue to advance in 3D content representation,the copyright issues surrounding 3D models oriented towards implicit representation become increasingly pressing.In response to this challenge,this paper treats the embedding and extraction of neural radiance field watermarks as inverse problems of image transformations and proposes a scheme for protecting neural radiance field copyrights using invertible neural network watermarking.Leveraging 2D image watermarking technology for 3D scene protection,the scheme embeds watermarks within the training images of neural radiance fields through the forward process in invertible neural networks and extracts them from images rendered by neural radiance fields through the reverse process,thereby ensuring copyright protection for both the neural radiance fields and associated 3D scenes.However,challenges such as information loss during rendering processes and deliberate tampering necessitate the design of an image quality enhancement module to increase the scheme’s robustness.This module restores distorted images through neural network processing before watermark extraction.Additionally,embedding watermarks in each training image enables watermark information extraction from multiple viewpoints.Our proposed watermarking method achieves a PSNR(Peak Signal-to-Noise Ratio)value exceeding 37 dB for images containing watermarks and 22 dB for recovered watermarked images,as evaluated on the Lego,Hotdog,and Chair datasets,respectively.These results demonstrate the efficacy of our scheme in enhancing copyright protection.展开更多
In the railway system,fasteners have the functions of damping,maintaining the track distance,and adjusting the track level.Therefore,routine maintenance and inspection of fasteners are important to ensure the safe ope...In the railway system,fasteners have the functions of damping,maintaining the track distance,and adjusting the track level.Therefore,routine maintenance and inspection of fasteners are important to ensure the safe operation of track lines.Currently,assessment methods for fastener tightness include manual observation,acoustic wave detection,and image detection.There are limitations such as low accuracy and efficiency,easy interference and misjudgment,and a lack of accurate,stable,and fast detection methods.Aiming at the small deformation characteristics and large elastic change of fasteners from full loosening to full tightening,this study proposes high-precision surface-structured light technology for fastener detection and fastener deformation feature extraction based on the center-line projection distance and a fastener tightness regression method based on neural networks.First,the method uses a 3D camera to obtain a fastener point cloud and then segments the elastic rod area based on the iterative closest point algorithm registration.Principal component analysis is used to calculate the normal vector of the segmented elastic rod surface and extract the point on the centerline of the elastic rod.The point is projected onto the upper surface of the bolt to calculate the projection distance.Subsequently,the mapping relationship between the projection distance sequence and fastener tightness is established,and the influence of each parameter on the fastener tightness prediction is analyzed.Finally,by setting up a fastener detection scene in the track experimental base,collecting data,and completing the algorithm verification,the results showed that the deviation between the fastener tightness regression value obtained after the algorithm processing and the actual measured value RMSE was 0.2196 mm,which significantly improved the effect compared with other tightness detection methods,and realized an effective fastener tightness regression.展开更多
The tradeoff between efficiency and model size of the convolutional neural network(CNN)is an essential issue for applications of CNN-based algorithms to diverse real-world tasks.Although deep learning-based methods ha...The tradeoff between efficiency and model size of the convolutional neural network(CNN)is an essential issue for applications of CNN-based algorithms to diverse real-world tasks.Although deep learning-based methods have achieved significant improvements in image super-resolution(SR),current CNNbased techniques mainly contain massive parameters and a high computational complexity,limiting their practical applications.In this paper,we present a fast and lightweight framework,named weighted multi-scale residual network(WMRN),for a better tradeoff between SR performance and computational efficiency.With the modified residual structure,depthwise separable convolutions(DS Convs)are employed to improve convolutional operations’efficiency.Furthermore,several weighted multi-scale residual blocks(WMRBs)are stacked to enhance the multi-scale representation capability.In the reconstruction subnetwork,a group of Conv layers are introduced to filter feature maps to reconstruct the final high-quality image.Extensive experiments were conducted to evaluate the proposed model,and the comparative results with several state-of-the-art algorithms demonstrate the effectiveness of WMRN.展开更多
Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are ...Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.展开更多
In this research, a method called ANNMG is presented to integrate Artificial Neural Networks and Geostatistics for optimum mineral reserve evaluation. The word ANNMG simply means Artificial Neural Network Model integr...In this research, a method called ANNMG is presented to integrate Artificial Neural Networks and Geostatistics for optimum mineral reserve evaluation. The word ANNMG simply means Artificial Neural Network Model integrated with Geostatiscs, In this procedure, the Artificial Neural Network was trained, tested and validated using assay values obtained from exploratory drillholes. Next, the validated model was used to generalize mineral grades at known and unknown sampled locations inside the drilling region respectively. Finally, the reproduced and generalized assay values were combined and fed to geostatistics in order to develop a geological 3D block model. The regression analysis revealed that the predicted sample grades were in close proximity to the actual sample grades, The generalized grades from the ANNMG show that this process could be used to complement exploration activities thereby reducing drilling requirement. It could also be an effective mineral reserve evaluation method that could oroduce optimum block model for mine design.展开更多
Benefiting from the development of hyperspectral imaging technology,hyperspectral image(HSI)classification has become a valuable direction in remote sensing image processing.Recently,researchers have found a connectio...Benefiting from the development of hyperspectral imaging technology,hyperspectral image(HSI)classification has become a valuable direction in remote sensing image processing.Recently,researchers have found a connection between convolutional neural networks(CNNs)and Gabor filters.Therefore,some Gabor-based CNN methods have been proposed for HSI classification.However,most Gabor-based CNN methods still manually generate Gabor filters whose parameters are empirically set and remain unchanged during the CNN learning process.Moreover,these methods require patch cubes as network inputs.Such patch cubes may contain interference pixels,which will negatively affect the classification results.To address these problems,in this paper,we propose a learnable three-dimensional(3D)Gabor convolutional network with global affinity attention for HSI classification.More precisely,the learnable 3D Gabor convolution kernel is constructed by the 3D Gabor filter,which can be learned and updated during the training process.Furthermore,spatial and spectral global affinity attention modules are introduced to capture more discriminative features between spatial locations and spectral bands in the patch cube,thus alleviating the interfering pixels problem.Experimental results on three well-known HSI datasets(including two natural crop scenarios and one urban scenario)have demonstrated that the proposed network can achieve powerful classification performance and outperforms widely used machine-learning-based and deep-learning-based methods.展开更多
In robot-assisted surgery projects,researchers should be able to make fast 3D reconstruction. Usually 2D images acquired with common diagnostic equipments such as UT, CT and MRI are not enough and complete for an accu...In robot-assisted surgery projects,researchers should be able to make fast 3D reconstruction. Usually 2D images acquired with common diagnostic equipments such as UT, CT and MRI are not enough and complete for an accurate 3D reconstruction. There are some interpolation methods for approximating non value voxels which consume large execution time. A novel algorithm is introduced based on generalized regression neural network (GRNN) which can interpolate unknown voxles fast and reliable. The GRNN interpolation is used to produce new 2D images between each two succeeding ultrasonic images. It is shown that the composition of GRNN with image distance transformation can produce higher quality 3D shapes. The results of this method are compared with other interpolation methods practically. It shows this method can decrease overall time consumption on online 3D reconstruction.展开更多
文摘Deep learning, especially through convolutional neural networks (CNN) such as the U-Net 3D model, has revolutionized fault identification from seismic data, representing a significant leap over traditional methods. Our review traces the evolution of CNN, emphasizing the adaptation and capabilities of the U-Net 3D model in automating seismic fault delineation with unprecedented accuracy. We find: 1) The transition from basic neural networks to sophisticated CNN has enabled remarkable advancements in image recognition, which are directly applicable to analyzing seismic data. The U-Net 3D model, with its innovative architecture, exemplifies this progress by providing a method for detailed and accurate fault detection with reduced manual interpretation bias. 2) The U-Net 3D model has demonstrated its superiority over traditional fault identification methods in several key areas: it has enhanced interpretation accuracy, increased operational efficiency, and reduced the subjectivity of manual methods. 3) Despite these achievements, challenges such as the need for effective data preprocessing, acquisition of high-quality annotated datasets, and achieving model generalization across different geological conditions remain. Future research should therefore focus on developing more complex network architectures and innovative training strategies to refine fault identification performance further. Our findings confirm the transformative potential of deep learning, particularly CNN like the U-Net 3D model, in geosciences, advocating for its broader integration to revolutionize geological exploration and seismic analysis.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.U20A20197,62306187the Foundation of Ministry of Industry and Information Technology TC220H05X-04.
文摘In recent years,semantic segmentation on 3D point cloud data has attracted much attention.Unlike 2D images where pixels distribute regularly in the image domain,3D point clouds in non-Euclidean space are irregular and inherently sparse.Therefore,it is very difficult to extract long-range contexts and effectively aggregate local features for semantic segmentation in 3D point cloud space.Most current methods either focus on local feature aggregation or long-range context dependency,but fail to directly establish a global-local feature extractor to complete the point cloud semantic segmentation tasks.In this paper,we propose a Transformer-based stratified graph convolutional network(SGT-Net),which enlarges the effective receptive field and builds direct long-range dependency.Specifically,we first propose a novel dense-sparse sampling strategy that provides dense local vertices and sparse long-distance vertices for subsequent graph convolutional network(GCN).Secondly,we propose a multi-key self-attention mechanism based on the Transformer to further weight augmentation for crucial neighboring relationships and enlarge the effective receptive field.In addition,to further improve the efficiency of the network,we propose a similarity measurement module to determine whether the neighborhood near the center point is effective.We demonstrate the validity and superiority of our method on the S3DIS and ShapeNet datasets.Through ablation experiments and segmentation visualization,we verify that the SGT model can improve the performance of the point cloud semantic segmentation.
基金Supported by the National Natural Science Foundation of China(No.61602191,61672521,61375037,61473291,61572501,61572536,61502491,61372107,61401167)the Natural Science Foundation of Fujian Province(No.2016J01308)+3 种基金the Scientific and Technology Funds of Quanzhou(No.2015Z114)the Scientific and Technology Funds of Xiamen(No.3502Z20173045)the Promotion Program for Young and Middle aged Teacher in Science and Technology Research of Huaqiao University(No.ZQN-PY418,ZQN-YX403)the Scientific Research Funds of Huaqiao University(No.16BS108)
文摘Pedestrian attribute classification from a pedestrian image captured in surveillance scenarios is challenging due to diverse clothing appearances,varied poses and different camera views. A multiscale and multi-label convolutional neural network( MSMLCNN) is proposed to predict multiple pedestrian attributes simultaneously. The pedestrian attribute classification problem is firstly transformed into a multi-label problem including multiple binary attributes needed to be classified. Then,the multi-label problem is solved by fully connecting all binary attributes to multi-scale features with logistic regression functions. Moreover,the multi-scale features are obtained by concatenating those featured maps produced from multiple pooling layers of the MSMLCNN at different scales. Extensive experiment results show that the proposed MSMLCNN outperforms state-of-the-art pedestrian attribute classification methods with a large margin.
文摘Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively.
基金Supported in part by Natural Science Foundation of China(Grant Nos.51835009,51705398)Shaanxi Province 2020 Natural Science Basic Research Plan(Grant No.2020JQ-042)Aeronautical Science Foundation(Grant No.2019ZB070001).
文摘As an integrated application of modern information technologies and artificial intelligence,Prognostic and Health Management(PHM)is important for machine health monitoring.Prediction of tool wear is one of the symbolic applications of PHM technology in modern manufacturing systems and industry.In this paper,a multi-scale Convolutional Gated Recurrent Unit network(MCGRU)is proposed to address raw sensory data for tool wear prediction.At the bottom of MCGRU,six parallel and independent branches with different kernel sizes are designed to form a multi-scale convolutional neural network,which augments the adaptability to features of different time scales.These features of different scales extracted from raw data are then fed into a Deep Gated Recurrent Unit network to capture long-term dependencies and learn significant representations.At the top of the MCGRU,a fully connected layer and a regression layer are built for cutting tool wear prediction.Two case studies are performed to verify the capability and effectiveness of the proposed MCGRU network and results show that MCGRU outperforms several state-of-the-art baseline models.
基金the National Natural Science Foundation of China(No.41274129)Chuan Qing Drilling Engineering Company's Scientific Research Project:Seismic detection technology and application of complex carbonate reservoir in Sulige Majiagou Formation and the 2018 Central Supporting Local Co-construction Fund(No.80000-18Z0140504)the Construction and Development of Universities in 2019-Joint Support for Geophysics(Double First-Class center,80000-19Z0204)。
文摘In this paper, the complete process of constructing 3D digital core by fullconvolutional neural network is described carefully. A large number of sandstone computedtomography (CT) images are used as training input for a fully convolutional neural networkmodel. This model is used to reconstruct the three-dimensional (3D) digital core of Bereasandstone based on a small number of CT images. The Hamming distance together with theMinkowski functions for porosity, average volume specifi c surface area, average curvature,and connectivity of both the real core and the digital reconstruction are used to evaluate theaccuracy of the proposed method. The results show that the reconstruction achieved relativeerrors of 6.26%, 1.40%, 6.06%, and 4.91% for the four Minkowski functions and a Hammingdistance of 0.04479. This demonstrates that the proposed method can not only reconstructthe physical properties of real sandstone but can also restore the real characteristics of poredistribution in sandstone, is the ability to which is a new way to characterize the internalmicrostructure of rocks.
基金Nanning Technology and Innovation Special Program(20204122)and Research Grant for 100 Talents of Guangxi Plan.
文摘Esophageal disease is a common disorder of the digestive system that can severely affect the quality of life andprognosis of patients. Esophageal stenting is an effective treatment that has been widely used in clinical practice.However, esophageal stents of different types and parameters have varying adaptability and effectiveness forpatients, and they need to be individually selected according to the patient’s specific situation. The purposeof this study was to provide a reference for clinical doctors to choose suitable esophageal stents. We used 3Dprinting technology to fabricate esophageal stents with different ratios of thermoplastic polyurethane (TPU)/(Poly-ε-caprolactone) PCL polymer, and established an artificial neural network model that could predict the radial forceof esophageal stents based on the content of TPU, PCL and print parameter. We selected three optimal ratios formechanical performance tests and evaluated the biomechanical effects of different ratios of stents on esophagealimplantation, swallowing, and stent migration processes through finite element numerical simulation and in vitrosimulation tests. The results showed that different ratios of polymer stents had different mechanical properties,affecting the effectiveness of stent expansion treatment and the possibility of postoperative complications of stentimplantation.
基金Supported by the Shaanxi Province Key Research and Development Project(No.2021GY-280)Shaanxi Province Natural Science Basic Re-search Program Project(No.2021JM-459)+1 种基金the National Natural Science Foundation of China(No.61834005,61772417,61802304,61602377,61634004)the Shaanxi Province International Science and Technology Cooperation Project(No.2018KW-006).
文摘The micro-expression lasts for a very short time and the intensity is very subtle.Aiming at the problem of its low recognition rate,this paper proposes a new micro-expression recognition algorithm based on a three-dimensional convolutional neural network(3D-CNN),which can extract two-di-mensional features in spatial domain and one-dimensional features in time domain,simultaneously.The network structure design is based on the deep learning framework Keras,and the discarding method and batch normalization(BN)algorithm are effectively combined with three-dimensional vis-ual geometry group block(3D-VGG-Block)to reduce the risk of overfitting while improving training speed.Aiming at the problem of the lack of samples in the data set,two methods of image flipping and small amplitude flipping are used for data amplification.Finally,the recognition rate on the data set is as high as 69.11%.Compared with the current international average micro-expression recog-nition rate of about 67%,the proposed algorithm has obvious advantages in recognition rate.
基金supported by the AG600 project of AVIC General Huanan Aircraft Industry Co.,Ltd.
文摘When checking the ice shape calculation software,its accuracy is judged based on the proximity between the calculated ice shape and the typical test ice shape.Therefore,determining the typical test ice shape becomes the key task of the icing wind tunnel tests.In the icing wind tunnel test of the tail wing model of a large amphibious aircraft,in order to obtain accurate typical test ice shape,the Romer Absolute Scanner is used to obtain the 3D point cloud data of the ice shape on the tail wing model.Then,the batch-learning self-organizing map(BLSOM)neural network is used to obtain the 2D average ice shape along the model direction based on the 3D point cloud data of the ice shape,while its tolerance band is calculated using the probabilistic statistical method.The results show that the combination of 2D average ice shape and its tolerance band can represent the 3D characteristics of the test ice shape effectively,which can be used as the typical test ice shape for comparative analysis with the calculated ice shape.
基金Supported by National Science Foundation of China(No.81800878)Interdisciplinary Program of Shanghai Jiao Tong University(No.YG2017QN24)+1 种基金Key Technological Research Projects of Songjiang District(No.18sjkjgg24)Bethune Langmu Ophthalmological Research Fund for Young and Middle-aged People(No.BJ-LM2018002J)
文摘AIM: To explore a segmentation algorithm based on deep learning to achieve accurate diagnosis and treatment of patients with retinal fluid.METHODS: A two-dimensional(2D) fully convolutional network for retinal segmentation was employed. In order to solve the category imbalance in retinal optical coherence tomography(OCT) images, the network parameters and loss function based on the 2D fully convolutional network were modified. For this network, the correlations of corresponding positions among adjacent images in space are ignored. Thus, we proposed a three-dimensional(3D) fully convolutional network for segmentation in the retinal OCT images.RESULTS: The algorithm was evaluated according to segmentation accuracy, Kappa coefficient, and F1 score. For the 3D fully convolutional network proposed in this paper, the overall segmentation accuracy rate is 99.56%, Kappa coefficient is 98.47%, and F1 score of retinal fluid is 95.50%. CONCLUSION: The OCT image segmentation algorithm based on deep learning is primarily founded on the 2D convolutional network. The 3D network architecture proposed in this paper reduces the influence of category imbalance, realizes end-to-end segmentation of volume images, and achieves optimal segmentation results. The segmentation maps are practically the same as the manual annotations of doctors, and can provide doctors with more accurate diagnostic data.
基金National Key Research and Development Project,China(No.2018YFB1308800)。
文摘The background pattern of patterned fabrics is complex,which has a great interference in the extraction of defect features.Traditional machine vision algorithms rely on artificially designed features,which are greatly affected by background patterns and are difficult to effectively extract flaw features.Therefore,a convolutional neural network(CNN)with automatic feature extraction is proposed.On the basis of the two-stage detection model Faster R-CNN,Resnet-50 is used as the backbone network,and the problem of flaws with extreme aspect ratio is solved by improving the initialization algorithm of the prior frame aspect ratio,and the improved multi-scale model is designed to improve detection of small defects.The cascade R-CNN is introduced to improve the accuracy of defect detection,and the online hard example mining(OHEM)algorithm is used to strengthen the learning of hard samples to reduce the interference of complex backgrounds on the defect detection of patterned fabrics,and construct the focal loss as a loss function to reduce the impact of sample imbalance.In order to verify the effectiveness of the improved algorithm,a defect detection comparison experiment was set up.The experimental results show that the accuracy of the defect detection algorithm of patterned fabrics in this paper can reach 95.7%,and it can accurately locate the defect location and meet the actual needs of the factory.
基金Supported by the National Natural Science Foundation of China(61903336,61976190)the Natural Science Foundation of Zhejiang Province(LY21F030015)。
文摘Background The use of remote photoplethysmography(rPPG)to estimate blood volume pulse in a noncontact manner has been an active research topic in recent years.Existing methods are primarily based on a singlescale region of interest(ROI).However,some noise signals that are not easily separated in a single-scale space can be easily separated in a multi-scale space.Also,existing spatiotemporal networks mainly focus on local spatiotemporal information and do not emphasize temporal information,which is crucial in pulse extraction problems,resulting in insufficient spatiotemporal feature modelling.Methods Here,we propose a multi-scale facial video pulse extraction network based on separable spatiotemporal convolution(SSTC)and dimension separable attention(DSAT).First,to solve the problem of a single-scale ROI,we constructed a multi-scale feature space for initial signal separation.Second,SSTC and DSAT were designed for efficient spatiotemporal correlation modeling,which increased the information interaction between the long-span time and space dimensions;this placed more emphasis on temporal features.Results The signal-to-noise ratio(SNR)of the proposed network reached 9.58dB on the PURE dataset and 6.77dB on the UBFC-rPPG dataset,outperforming state-of-the-art algorithms.Conclusions The results showed that fusing multi-scale signals yielded better results than methods based on only single-scale signals.The proposed SSTC and dimension-separable attention mechanism will contribute to more accurate pulse signal extraction.
基金supported by the National Natural Science Foundation of China,with Fund Numbers 62272478,62102451the National Defense Science and Technology Independent Research Project(Intelligent Information Hiding Technology and Its Applications in a Certain Field)and Science and Technology Innovation Team Innovative Research Project Research on Key Technologies for Intelligent Information Hiding”with Fund Number ZZKY20222102.
文摘As neural radiance fields continue to advance in 3D content representation,the copyright issues surrounding 3D models oriented towards implicit representation become increasingly pressing.In response to this challenge,this paper treats the embedding and extraction of neural radiance field watermarks as inverse problems of image transformations and proposes a scheme for protecting neural radiance field copyrights using invertible neural network watermarking.Leveraging 2D image watermarking technology for 3D scene protection,the scheme embeds watermarks within the training images of neural radiance fields through the forward process in invertible neural networks and extracts them from images rendered by neural radiance fields through the reverse process,thereby ensuring copyright protection for both the neural radiance fields and associated 3D scenes.However,challenges such as information loss during rendering processes and deliberate tampering necessitate the design of an image quality enhancement module to increase the scheme’s robustness.This module restores distorted images through neural network processing before watermark extraction.Additionally,embedding watermarks in each training image enables watermark information extraction from multiple viewpoints.Our proposed watermarking method achieves a PSNR(Peak Signal-to-Noise Ratio)value exceeding 37 dB for images containing watermarks and 22 dB for recovered watermarked images,as evaluated on the Lego,Hotdog,and Chair datasets,respectively.These results demonstrate the efficacy of our scheme in enhancing copyright protection.
基金Supported by Fundamental Research Funds for the Central Universities of China(Grant No.2023JBMC014).
文摘In the railway system,fasteners have the functions of damping,maintaining the track distance,and adjusting the track level.Therefore,routine maintenance and inspection of fasteners are important to ensure the safe operation of track lines.Currently,assessment methods for fastener tightness include manual observation,acoustic wave detection,and image detection.There are limitations such as low accuracy and efficiency,easy interference and misjudgment,and a lack of accurate,stable,and fast detection methods.Aiming at the small deformation characteristics and large elastic change of fasteners from full loosening to full tightening,this study proposes high-precision surface-structured light technology for fastener detection and fastener deformation feature extraction based on the center-line projection distance and a fastener tightness regression method based on neural networks.First,the method uses a 3D camera to obtain a fastener point cloud and then segments the elastic rod area based on the iterative closest point algorithm registration.Principal component analysis is used to calculate the normal vector of the segmented elastic rod surface and extract the point on the centerline of the elastic rod.The point is projected onto the upper surface of the bolt to calculate the projection distance.Subsequently,the mapping relationship between the projection distance sequence and fastener tightness is established,and the influence of each parameter on the fastener tightness prediction is analyzed.Finally,by setting up a fastener detection scene in the track experimental base,collecting data,and completing the algorithm verification,the results showed that the deviation between the fastener tightness regression value obtained after the algorithm processing and the actual measured value RMSE was 0.2196 mm,which significantly improved the effect compared with other tightness detection methods,and realized an effective fastener tightness regression.
基金the National Natural Science Foundation of China(61772149,61866009,61762028,U1701267,61702169)Guangxi Science and Technology Project(2019GXNSFFA245014,ZY20198016,AD18281079,AD18216004)+1 种基金the Natural Science Foundation of Hunan Province(2020JJ3014)Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics(GIIP202001).
文摘The tradeoff between efficiency and model size of the convolutional neural network(CNN)is an essential issue for applications of CNN-based algorithms to diverse real-world tasks.Although deep learning-based methods have achieved significant improvements in image super-resolution(SR),current CNNbased techniques mainly contain massive parameters and a high computational complexity,limiting their practical applications.In this paper,we present a fast and lightweight framework,named weighted multi-scale residual network(WMRN),for a better tradeoff between SR performance and computational efficiency.With the modified residual structure,depthwise separable convolutions(DS Convs)are employed to improve convolutional operations’efficiency.Furthermore,several weighted multi-scale residual blocks(WMRBs)are stacked to enhance the multi-scale representation capability.In the reconstruction subnetwork,a group of Conv layers are introduced to filter feature maps to reconstruct the final high-quality image.Extensive experiments were conducted to evaluate the proposed model,and the comparative results with several state-of-the-art algorithms demonstrate the effectiveness of WMRN.
基金This work was supported by the Project of Sichuan Outstanding Young Scientific and Technological Talents(19JCQN0003)the major Project of Education Department in Sichuan(17ZA0063 and 2017JQ0030)+1 种基金in part by the Natural Science Foundation for Young Scientists of CUIT(J201704)the Sichuan Science and Technology Program(2019JDRC0077).
文摘Cardiomyopathy is one of the most serious public health threats.The precise structural and functional cardiac measurement is an essential step for clinical diagnosis and follow-up treatment planning.Cardiologists are often required to draw endocardial and epicardial contours of the left ventricle(LV)manually in routine clinical diagnosis or treatment planning period.This task is time-consuming and error-prone.Therefore,it is necessary to develop a fully automated end-to-end semantic segmentation method on cardiac magnetic resonance(CMR)imaging datasets.However,due to the low image quality and the deformation caused by heartbeat,there is no effective tool for fully automated end-to-end cardiac segmentation task.In this work,we propose a multi-scale segmentation network(MSSN)for left ventricle segmentation.It can effectively learn myocardium and blood pool structure representations from 2D short-axis CMR image slices in a multi-scale way.Specifically,our method employs both parallel and serial of dilated convolution layers with different dilation rates to capture multi-scale semantic features.Moreover,we design graduated up-sampling layers with subpixel layers as the decoder to reconstruct lost spatial information and produce accurate segmentation masks.We validated our method using 164 T1 Mapping CMR images and showed that it outperforms the advanced convolutional neural network(CNN)models.In validation metrics,we archived the Dice Similarity Coefficient(DSC)metric of 78.96%.
基金the management of Sierra Rutile Company for providing the drillhole dataset used in this studythe Japanese Ministry of Education Science and Technology (MEXT) Scholarship for academic funding
文摘In this research, a method called ANNMG is presented to integrate Artificial Neural Networks and Geostatistics for optimum mineral reserve evaluation. The word ANNMG simply means Artificial Neural Network Model integrated with Geostatiscs, In this procedure, the Artificial Neural Network was trained, tested and validated using assay values obtained from exploratory drillholes. Next, the validated model was used to generalize mineral grades at known and unknown sampled locations inside the drilling region respectively. Finally, the reproduced and generalized assay values were combined and fed to geostatistics in order to develop a geological 3D block model. The regression analysis revealed that the predicted sample grades were in close proximity to the actual sample grades, The generalized grades from the ANNMG show that this process could be used to complement exploration activities thereby reducing drilling requirement. It could also be an effective mineral reserve evaluation method that could oroduce optimum block model for mine design.
基金Project supported by the Fundamental Research Funds in Heilongjiang Provincial Universities(Grant No.145109218)the Natural Science Foundation of Heilongjiang Province of China(Grant No.LH2020F050)
文摘Benefiting from the development of hyperspectral imaging technology,hyperspectral image(HSI)classification has become a valuable direction in remote sensing image processing.Recently,researchers have found a connection between convolutional neural networks(CNNs)and Gabor filters.Therefore,some Gabor-based CNN methods have been proposed for HSI classification.However,most Gabor-based CNN methods still manually generate Gabor filters whose parameters are empirically set and remain unchanged during the CNN learning process.Moreover,these methods require patch cubes as network inputs.Such patch cubes may contain interference pixels,which will negatively affect the classification results.To address these problems,in this paper,we propose a learnable three-dimensional(3D)Gabor convolutional network with global affinity attention for HSI classification.More precisely,the learnable 3D Gabor convolution kernel is constructed by the 3D Gabor filter,which can be learned and updated during the training process.Furthermore,spatial and spectral global affinity attention modules are introduced to capture more discriminative features between spatial locations and spectral bands in the patch cube,thus alleviating the interfering pixels problem.Experimental results on three well-known HSI datasets(including two natural crop scenarios and one urban scenario)have demonstrated that the proposed network can achieve powerful classification performance and outperforms widely used machine-learning-based and deep-learning-based methods.
文摘In robot-assisted surgery projects,researchers should be able to make fast 3D reconstruction. Usually 2D images acquired with common diagnostic equipments such as UT, CT and MRI are not enough and complete for an accurate 3D reconstruction. There are some interpolation methods for approximating non value voxels which consume large execution time. A novel algorithm is introduced based on generalized regression neural network (GRNN) which can interpolate unknown voxles fast and reliable. The GRNN interpolation is used to produce new 2D images between each two succeeding ultrasonic images. It is shown that the composition of GRNN with image distance transformation can produce higher quality 3D shapes. The results of this method are compared with other interpolation methods practically. It shows this method can decrease overall time consumption on online 3D reconstruction.