Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo...Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.展开更多
With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural network...With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural networks.These intelligent and automated methods significantly reduce manual labor,particularly in the laborious task of manually labeling seismic facies.However,the extensive demand for training data imposes limitations on their wider application.To overcome this challenge,we adopt the UNet architecture as the foundational network structure for seismic facies classification,which has demonstrated effective segmentation results even with small-sample training data.Additionally,we integrate spatial pyramid pooling and dilated convolution modules into the network architecture to enhance the perception of spatial information across a broader range.The seismic facies classification test on the public data from the F3 block verifies the superior performance of our proposed improved network structure in delineating seismic facies boundaries.Comparative analysis against the traditional UNet model reveals that our method achieves more accurate predictive classification results,as evidenced by various evaluation metrics for image segmentation.Obviously,the classification accuracy reaches an impressive 96%.Furthermore,the results of seismic facies classification in the seismic slice dimension provide further confirmation of the superior performance of our proposed method,which accurately defines the range of different seismic facies.This approach holds significant potential for analyzing geological patterns and extracting valuable depositional information.展开更多
In ocean explorations,side-scan sonar(SSS)plays a very important role and can quickly depict seabed topography.As-sembling the SSS to an autonomous underwater vehicle(AUV)and performing semantic segmentation of an SSS...In ocean explorations,side-scan sonar(SSS)plays a very important role and can quickly depict seabed topography.As-sembling the SSS to an autonomous underwater vehicle(AUV)and performing semantic segmentation of an SSS image in real time can realize online submarine geomorphology or target recognition,which is conducive to submarine detection.However,because of the complexity of the marine environment,various noises in the ocean pollute the sonar image,which also encounters the intensity inhomogeneity problem.In this paper,we propose a novel neural network architecture named dilated convolutional neural network(DcNet)that can run in real time while addressing the above-mentioned issues and providing accurate semantic segmentation.The proposed architecture presents an encoder-decoder network to gradually reduce the spatial dimension of the input image and recover the details of the target,respectively.The core of our network is a novel block connection named DCblock,which mainly uses dilated convolution and depthwise separable convolution between the encoder and decoder to attain more context while still retaining high accuracy.Furthermore,our proposed method performs a super-resolution reconstruction to enlarge the dataset with high-quality im-ages.We compared our network to other common semantic segmentation networks performed on an NVIDIA Jetson TX2 using our sonar image datasets.Experimental results show that while the inference speed of the proposed network significantly outperforms state-of-the-art architectures,the accuracy of our method is still comparable,which indicates its potential applications not only in AUVs equipped with SSS but also in marine exploration.展开更多
Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Re...Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Recently,researchers have developed many techniques in this field in order to ensure an improvement in the accuracy by utilizing several deep learning approaches,but the recognition rate is still not convincing.Our main aim is to develop a new technique that increases the recognition rate with reasonable cost computations.In this paper,we suggested a new technique,which is a one-dimensional dilated convolutional neural network(1D-DCNN)for speech emotion recognition(SER)that utilizes the hierarchical features learning blocks(HFLBs)with a bi-directional gated recurrent unit(BiGRU).We designed a one-dimensional CNN network to enhance the speech signals,which uses a spectral analysis,and to extract the hidden patterns from the speech signals that are fed into a stacked one-dimensional dilated network that are called HFLBs.Each HFLB contains one dilated convolution layer(DCL),one batch normalization(BN),and one leaky_relu(Relu)layer in order to extract the emotional features using a hieratical correlation strategy.Furthermore,the learned emotional features are feed into a BiGRU in order to adjust the global weights and to recognize the temporal cues.The final state of the deep BiGRU is passed from a softmax classifier in order to produce the probabilities of the emotions.The proposed model was evaluated over three benchmarked datasets that included the IEMOCAP,EMO-DB,and RAVDESS,which achieved 72.75%,91.14%,and 78.01%accuracy,respectively.展开更多
Aiming at the problem of image information loss,dilated convolution is introduced and a novel multi⁃scale dilated convolutional neural network(MDCNN)is proposed.Dilated convolution can polymerize image multi⁃scale inf...Aiming at the problem of image information loss,dilated convolution is introduced and a novel multi⁃scale dilated convolutional neural network(MDCNN)is proposed.Dilated convolution can polymerize image multi⁃scale information without reducing the resolution.The first layer of the network used spectral convolutional step to reduce dimensionality.Then the multi⁃scale aggregation extracted multi⁃scale features through applying dilated convolution and shortcut connection.The extracted features which represent properties of data were fed through Softmax to predict the samples.MDCNN achieved the overall accuracy of 99.58% and 99.92% on two public datasets,Indian Pines and Pavia University.Compared with four other existing models,the results illustrate that MDCNN can extract better discriminative features and achieve higher classification performance.展开更多
How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is pro...How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is proposed in this paper.The architecture of the attention-relation network contains two modules:a feature extract module and a feature metric module.Different from other few-shot models,an attention mechanism is applied to metric learning in our model to measure the distance between features,so as to pay attention to the correlation between features and suppress unwanted information.Besides,we combine dilated convolution and skip connection to extract more feature information for follow-up processing.We validate attention-relation network on the mobile phone screen defect dataset.The experimental results show that the classification accuracy of the attentionrelation network is 0.9486 under the 5-way 1-shot training strategy and 0.9039 under the 5-way 5-shot setting.It achieves the excellent effect of classification for mobile phone screen defects and outperforms with dominant advantages.展开更多
This study addresses the limitations of Transformer models in image feature extraction,particularly their lack of inductive bias for visual structures.Compared to Convolutional Neural Networks(CNNs),the Transformers a...This study addresses the limitations of Transformer models in image feature extraction,particularly their lack of inductive bias for visual structures.Compared to Convolutional Neural Networks(CNNs),the Transformers are more sensitive to different hyperparameters of optimizers,which leads to a lack of stability and slow convergence.To tackle these challenges,we propose the Convolution-based Efficient Transformer Image Feature Extraction Network(CEFormer)as an enhancement of the Transformer architecture.Our model incorporates E-Attention,depthwise separable convolution,and dilated convolution to introduce crucial inductive biases,such as translation invariance,locality,and scale invariance,into the Transformer framework.Additionally,we implement a lightweight convolution module to process the input images,resulting in faster convergence and improved stability.This results in an efficient convolution combined Transformer image feature extraction network.Experimental results on the ImageNet1k Top-1 dataset demonstrate that the proposed network achieves better accuracy while maintaining high computational speed.It achieves up to 85.0%accuracy across various model sizes on image classification,outperforming various baseline models.When integrated into the Mask Region-ConvolutionalNeuralNetwork(R-CNN)framework as a backbone network,CEFormer outperforms other models and achieves the highest mean Average Precision(mAP)scores.This research presents a significant advancement in Transformer-based image feature extraction,balancing performance and computational efficiency.展开更多
Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in t...Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.展开更多
Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low a...Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.展开更多
As a core part of battlefield situational awareness,air target intention recognition plays an important role in modern air operations.Aiming at the problems of insufficient feature extraction and misclassification in ...As a core part of battlefield situational awareness,air target intention recognition plays an important role in modern air operations.Aiming at the problems of insufficient feature extraction and misclassification in intention recognition,this paper designs an air target intention recognition method(KGTLIR)based on Knowledge Graph and Deep Learning.Firstly,the intention recognition model based on Deep Learning is constructed to mine the temporal relationship of intention features using dilated causal convolution and the spatial relationship of intention features using a graph attention mechanism.Meanwhile,the accuracy,recall,and F1-score after iteration are introduced to dynamically adjust the sample weights to reduce the probability of misclassification.After that,an intention recognition model based on Knowledge Graph is constructed to predict the probability of the occurrence of different intentions of the target.Finally,the results of the two models are fused by evidence theory to obtain the target’s operational intention.Experiments show that the intention recognition accuracy of the KGTLIRmodel can reach 98.48%,which is not only better than most of the air target intention recognition methods,but also demonstrates better interpretability and trustworthiness.展开更多
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such...Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Deep convolutional neural networks(CNNs)with strong learning abilities have been used in the field of image denoising.However,some CNNs depend on a single deep network to train an image denoising model,which will have...Deep convolutional neural networks(CNNs)with strong learning abilities have been used in the field of image denoising.However,some CNNs depend on a single deep network to train an image denoising model,which will have poor performance in complex screens.To address this problem,we propose a hybrid denoising CNN(HDCNN).HDCNN is composed of a dilated block(DB),RepVGG block(RVB),feature refinement block(FB),and a single convolution.DB combines a dilated convolution,batch normalization(BN),common convolutions,and activation function of ReLU to obtain more context information.RVB uses parallel combination of convolution,BN,and ReLU to extract complementary width features.FB is used to obtain more accurate information via refining obtained feature from the RVB.A single convolution collaborates a residual learning operation to construct a clean image.These key components make the HDCNN have good performance in image denoising.Experiment shows that the proposed HDCNN enjoys good denoising effect in public data sets.展开更多
Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of s...Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of small object detection,which are unsuitable for practical application scenar-ios.Therefore,this paper proposes a new helmet-wearing detection algorithm based on the You Only Look Once version 5(YOLOv5).First,the Dilated convolution In Coordinate Attention(DICA)layer is added to the backbone network.DICA combines the coordinated attention mechanism with atrous convolution to replace the original convolution layer,which can increase the perceptual field of the network to get more contextual information.Also,it can reduce the network’s learning of unnecessary features in the background and get attention to small objects.Second,the Rebuild Bidirectional Feature Pyramid Network(Re-BiFPN)is used as a feature extraction network.Re-BiFPN uses cross-scale feature fusion to combine the semantic information features at the high level with the spatial information features at the bottom level,which facilitates the model to learn object features at different scales.Verified on the proposed“Helmet Wearing dataset for Non-motor Drivers(HWND),”the results show that the proposed model is superior to the current detection algorithms,with the mean average precision(mAP)of 94.3%under complex background.展开更多
A face-mask object detection model incorporating hybrid dilation convolutional network termed ResNet Hybrid-dilation-convolution Face-mask-detector (RHF) is proposed in this paper. Furthermore, a lightweight face-mask...A face-mask object detection model incorporating hybrid dilation convolutional network termed ResNet Hybrid-dilation-convolution Face-mask-detector (RHF) is proposed in this paper. Furthermore, a lightweight face-mask dataset named Light Masked Face Dataset (LMFD) and a medium-sized face-mask dataset named Masked Face Dataset (MFD) with data augmentation methods applied is also constructed in this paper. The hybrid dilation convolutional network is able to expand the perception of the convolutional kernel without concern about the discontinuity of image information during the convolution process. For the given two datasets being constructed above, the trained models are significantly optimized in terms of detection performance, training time, and other related metrics. By using the MFD dataset of 55,905 images, the RHF model requires roughly 10 hours less training time compared to ResNet50 with better detection results with mAP of 93.45%.展开更多
Traditional compressed sensing algorithm is used to reconstruct images by iteratively optimizing a small number of measured values.The computation is complex and the reconstruction time is long.The deep learning-based...Traditional compressed sensing algorithm is used to reconstruct images by iteratively optimizing a small number of measured values.The computation is complex and the reconstruction time is long.The deep learning-based compressed sensing algorithm can greatly shorten the reconstruction time,but the algorithm emphasis is placed on reconstructing the network part mostly.The random measurement matrix cannot measure the image features well,which leads the reconstructed image quality to be improved limitedly.Two kinds of networks are proposed for solving this problem.The first one is ReconNet’s improved network IReconNet,which replaces the traditional linear random measurement matrix with an adaptive nonlinear measurement network.The reconstruction quality and anti-noise performance are greatly improved.Because the measured values extracted by the measurement network also retain the characteristics of image spatial information,the image is reconstructed by bilinear interpolation algorithm(Bilinear)and dilate convolution.Therefore a second network USDCNN is proposed.On the BSD500 dataset,the sampling rates are 0.25,0.10,0.04,and 0.01,the average peak signal-noise ratio(PSNR)of USDCNN is 1.62 dB,1.31 dB,1.47 dB,and 1.95 dB higher than that of MSRNet.Experiments show the average reconstruction time of USDCNN is 0.2705 s,0.3671 s,0.3602 s,and 0.3929 s faster than that of ReconNet.Moreover,there is also a great advantage in anti-noise performance.展开更多
With the rapid progress of deep convolutional neural networks,several applications of crowd counting have been proposed and explored in the literature.In congested scene monitoring,a variety of crowd density estimatin...With the rapid progress of deep convolutional neural networks,several applications of crowd counting have been proposed and explored in the literature.In congested scene monitoring,a variety of crowd density estimating approaches has been developed.The understanding of highly congested scenes for crowd counting during Muslim gatherings of Hajj and Umrah is a challenging task,as a large number of individuals stand nearby and,it is hard for detection techniques to recognize them,as the crowd can vary from low density to high density.To deal with such highly congested scenes,we have proposed the Congested Scene Crowd Counting Network(CSCC-Net)using VGG-16 as a core network with its first ten layers due to its strong and robust transfer learning rate.A hole dilated convolutional neural network is used at the back end to widen the relevant field to extract a large range of information from the image without losing its original resolution.The dilated convolution neural network is mainly chosen to expand the kernel size without changing other parameters.Moreover,several loss functions have been applied to strengthen the evaluation accuracy of the model.Finally,the entire experiments have been evaluated using prominent data sets namely,ShanghaiTech parts A,B,UCF_CC_50,and UCF_QNRF.Our model has achieved remarkable results i.e.,68.0 and 9.0 MAE on ShanghaiTech parts A,B,199.1 MAE on UCF_CC_50,and 99.8 on UCF_QNRF data sets respectively.展开更多
The deep learning method has made nurnerials achievements regarding anomaly detection in the field of time series.We introduce the speech production model in the field of artificial intelligence,changing the convoluti...The deep learning method has made nurnerials achievements regarding anomaly detection in the field of time series.We introduce the speech production model in the field of artificial intelligence,changing the convolution layer of the general convolution neural network to the residual element structure by adding identity mapping,and expanding the receptive domain of the model by using the dilated causal convolution.Based on the dilated causal convolution network and the method of log probability density function,the anomalous events are detected according to the anomaly scores.The validity of the method is verified by the simulation data,which is applied to the actual observed data on the observation staion of Pingliang geoeletric field in Gansu Province.The results show that one month before the Wenchuan M_S8.0,Lushan M_S7.0 and Minxian-Zhangxian M_S6.6 earthquakes,the daily cumulative error of log probability density of the predicted results in Pingliang Station suddenly decreases,which is consistent with the actual earthquake anomalies in a certain time range.After analyzing the combined factors including the spatial electromagnetic environment and the variation of micro fissures before the earthquake,we explain the possible causes of the anomalies in the geoelectric field of before the earthquake.The successful application of deep learning in observed data of the geoelectric field may behefit for improving the ultilization rate both the data and the efficiency of detection.Besides,it may provide technical support for more seismic research.展开更多
Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This st...Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.展开更多
Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstre...Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.展开更多
基金Fundamental Research Funds for the Central University,China(No.2232018D3-17)。
文摘Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.
基金funded by the Fundamental Research Project of CNPC Geophysical Key Lab(2022DQ0604-4)the Strategic Cooperation Technology Projects of China National Petroleum Corporation and China University of Petroleum-Beijing(ZLZX 202003)。
文摘With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural networks.These intelligent and automated methods significantly reduce manual labor,particularly in the laborious task of manually labeling seismic facies.However,the extensive demand for training data imposes limitations on their wider application.To overcome this challenge,we adopt the UNet architecture as the foundational network structure for seismic facies classification,which has demonstrated effective segmentation results even with small-sample training data.Additionally,we integrate spatial pyramid pooling and dilated convolution modules into the network architecture to enhance the perception of spatial information across a broader range.The seismic facies classification test on the public data from the F3 block verifies the superior performance of our proposed improved network structure in delineating seismic facies boundaries.Comparative analysis against the traditional UNet model reveals that our method achieves more accurate predictive classification results,as evidenced by various evaluation metrics for image segmentation.Obviously,the classification accuracy reaches an impressive 96%.Furthermore,the results of seismic facies classification in the seismic slice dimension provide further confirmation of the superior performance of our proposed method,which accurately defines the range of different seismic facies.This approach holds significant potential for analyzing geological patterns and extracting valuable depositional information.
基金This work is partially supported by the Natural Key Research and Development Program of China(No.2016YF C0301400).
文摘In ocean explorations,side-scan sonar(SSS)plays a very important role and can quickly depict seabed topography.As-sembling the SSS to an autonomous underwater vehicle(AUV)and performing semantic segmentation of an SSS image in real time can realize online submarine geomorphology or target recognition,which is conducive to submarine detection.However,because of the complexity of the marine environment,various noises in the ocean pollute the sonar image,which also encounters the intensity inhomogeneity problem.In this paper,we propose a novel neural network architecture named dilated convolutional neural network(DcNet)that can run in real time while addressing the above-mentioned issues and providing accurate semantic segmentation.The proposed architecture presents an encoder-decoder network to gradually reduce the spatial dimension of the input image and recover the details of the target,respectively.The core of our network is a novel block connection named DCblock,which mainly uses dilated convolution and depthwise separable convolution between the encoder and decoder to attain more context while still retaining high accuracy.Furthermore,our proposed method performs a super-resolution reconstruction to enlarge the dataset with high-quality im-ages.We compared our network to other common semantic segmentation networks performed on an NVIDIA Jetson TX2 using our sonar image datasets.Experimental results show that while the inference speed of the proposed network significantly outperforms state-of-the-art architectures,the accuracy of our method is still comparable,which indicates its potential applications not only in AUVs equipped with SSS but also in marine exploration.
基金supported by the National Research Foundation of Korea funded by the Korean Government through the Ministry of Science and ICT under Grant NRF-2020R1F1A1060659 and in part by the 2020 Faculty Research Fund of Sejong University。
文摘Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Recently,researchers have developed many techniques in this field in order to ensure an improvement in the accuracy by utilizing several deep learning approaches,but the recognition rate is still not convincing.Our main aim is to develop a new technique that increases the recognition rate with reasonable cost computations.In this paper,we suggested a new technique,which is a one-dimensional dilated convolutional neural network(1D-DCNN)for speech emotion recognition(SER)that utilizes the hierarchical features learning blocks(HFLBs)with a bi-directional gated recurrent unit(BiGRU).We designed a one-dimensional CNN network to enhance the speech signals,which uses a spectral analysis,and to extract the hidden patterns from the speech signals that are fed into a stacked one-dimensional dilated network that are called HFLBs.Each HFLB contains one dilated convolution layer(DCL),one batch normalization(BN),and one leaky_relu(Relu)layer in order to extract the emotional features using a hieratical correlation strategy.Furthermore,the learned emotional features are feed into a BiGRU in order to adjust the global weights and to recognize the temporal cues.The final state of the deep BiGRU is passed from a softmax classifier in order to produce the probabilities of the emotions.The proposed model was evaluated over three benchmarked datasets that included the IEMOCAP,EMO-DB,and RAVDESS,which achieved 72.75%,91.14%,and 78.01%accuracy,respectively.
基金Sponsored by the Project of Multi Modal Monitoring Information Learning Fusion and Health Warning Diagnosis of Wind Power Transmission System(Grant No.61803329)the Research on Product Quality Inspection Method Based on Time Series Analysis(Grant No.201703A020)the Research on the Theory and Reliability of Group Coordinated Control of Hydraulic System for Large Engineering Transportation Vehicles(Grant No.51675461).
文摘Aiming at the problem of image information loss,dilated convolution is introduced and a novel multi⁃scale dilated convolutional neural network(MDCNN)is proposed.Dilated convolution can polymerize image multi⁃scale information without reducing the resolution.The first layer of the network used spectral convolutional step to reduce dimensionality.Then the multi⁃scale aggregation extracted multi⁃scale features through applying dilated convolution and shortcut connection.The extracted features which represent properties of data were fed through Softmax to predict the samples.MDCNN achieved the overall accuracy of 99.58% and 99.92% on two public datasets,Indian Pines and Pavia University.Compared with four other existing models,the results illustrate that MDCNN can extract better discriminative features and achieve higher classification performance.
文摘How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is proposed in this paper.The architecture of the attention-relation network contains two modules:a feature extract module and a feature metric module.Different from other few-shot models,an attention mechanism is applied to metric learning in our model to measure the distance between features,so as to pay attention to the correlation between features and suppress unwanted information.Besides,we combine dilated convolution and skip connection to extract more feature information for follow-up processing.We validate attention-relation network on the mobile phone screen defect dataset.The experimental results show that the classification accuracy of the attentionrelation network is 0.9486 under the 5-way 1-shot training strategy and 0.9039 under the 5-way 5-shot setting.It achieves the excellent effect of classification for mobile phone screen defects and outperforms with dominant advantages.
基金Support by Sichuan Science and Technology Program(2021YFQ0003,2023YFSY 0026,2023YFH0004).
文摘This study addresses the limitations of Transformer models in image feature extraction,particularly their lack of inductive bias for visual structures.Compared to Convolutional Neural Networks(CNNs),the Transformers are more sensitive to different hyperparameters of optimizers,which leads to a lack of stability and slow convergence.To tackle these challenges,we propose the Convolution-based Efficient Transformer Image Feature Extraction Network(CEFormer)as an enhancement of the Transformer architecture.Our model incorporates E-Attention,depthwise separable convolution,and dilated convolution to introduce crucial inductive biases,such as translation invariance,locality,and scale invariance,into the Transformer framework.Additionally,we implement a lightweight convolution module to process the input images,resulting in faster convergence and improved stability.This results in an efficient convolution combined Transformer image feature extraction network.Experimental results on the ImageNet1k Top-1 dataset demonstrate that the proposed network achieves better accuracy while maintaining high computational speed.It achieves up to 85.0%accuracy across various model sizes on image classification,outperforming various baseline models.When integrated into the Mask Region-ConvolutionalNeuralNetwork(R-CNN)framework as a backbone network,CEFormer outperforms other models and achieves the highest mean Average Precision(mAP)scores.This research presents a significant advancement in Transformer-based image feature extraction,balancing performance and computational efficiency.
基金supported by the National Key Research and Development Program of China(No.2018YFB2101300)the National Natural Science Foundation of China(Grant No.61871186)the Dean’s Fund of Engineering Research Center of Software/Hardware Co-Design Technology and Application,Ministry of Education(East China Normal University).
文摘Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.
基金funded by the National Natural Foundation of China under Grant No.61172167the Science Fund Project of Heilongjiang Province(LH2020F035).
文摘Nuclearmagnetic resonance imaging of breasts often presents complex backgrounds.Breast tumors exhibit varying sizes,uneven intensity,and indistinct boundaries.These characteristics can lead to challenges such as low accuracy and incorrect segmentation during tumor segmentation.Thus,we propose a two-stage breast tumor segmentation method leveraging multi-scale features and boundary attention mechanisms.Initially,the breast region of interest is extracted to isolate the breast area from surrounding tissues and organs.Subsequently,we devise a fusion network incorporatingmulti-scale features and boundary attentionmechanisms for breast tumor segmentation.We incorporate multi-scale parallel dilated convolution modules into the network,enhancing its capability to segment tumors of various sizes through multi-scale convolution and novel fusion techniques.Additionally,attention and boundary detection modules are included to augment the network’s capacity to locate tumors by capturing nonlocal dependencies in both spatial and channel domains.Furthermore,a hybrid loss function with boundary weight is employed to address sample class imbalance issues and enhance the network’s boundary maintenance capability through additional loss.Themethod was evaluated using breast data from 207 patients at RuijinHospital,resulting in a 6.64%increase in Dice similarity coefficient compared to the benchmarkU-Net.Experimental results demonstrate the superiority of the method over other segmentation techniques,with fewer model parameters.
基金funded by the Project of the National Natural Science Foundation of China,Grant Number 72071209.
文摘As a core part of battlefield situational awareness,air target intention recognition plays an important role in modern air operations.Aiming at the problems of insufficient feature extraction and misclassification in intention recognition,this paper designs an air target intention recognition method(KGTLIR)based on Knowledge Graph and Deep Learning.Firstly,the intention recognition model based on Deep Learning is constructed to mine the temporal relationship of intention features using dilated causal convolution and the spatial relationship of intention features using a graph attention mechanism.Meanwhile,the accuracy,recall,and F1-score after iteration are introduced to dynamically adjust the sample weights to reduce the probability of misclassification.After that,an intention recognition model based on Knowledge Graph is constructed to predict the probability of the occurrence of different intentions of the target.Finally,the results of the two models are fused by evidence theory to obtain the target’s operational intention.Experiments show that the intention recognition accuracy of the KGTLIRmodel can reach 98.48%,which is not only better than most of the air target intention recognition methods,but also demonstrates better interpretability and trustworthiness.
文摘Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079in part by the Fundamental Research Funds for the Central Universities under Grant D5000210966in part by the Basic Research Plan in Taicang under Grant TC2021JC23.
文摘Deep convolutional neural networks(CNNs)with strong learning abilities have been used in the field of image denoising.However,some CNNs depend on a single deep network to train an image denoising model,which will have poor performance in complex screens.To address this problem,we propose a hybrid denoising CNN(HDCNN).HDCNN is composed of a dilated block(DB),RepVGG block(RVB),feature refinement block(FB),and a single convolution.DB combines a dilated convolution,batch normalization(BN),common convolutions,and activation function of ReLU to obtain more context information.RVB uses parallel combination of convolution,BN,and ReLU to extract complementary width features.FB is used to obtain more accurate information via refining obtained feature from the RVB.A single convolution collaborates a residual learning operation to construct a clean image.These key components make the HDCNN have good performance in image denoising.Experiment shows that the proposed HDCNN enjoys good denoising effect in public data sets.
基金funded by Natural Science Foundation of Hunan Province under Grant NO:2021JJ31142,author F.J,http://kjt.hunan.gov.cn/.
文摘Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of small object detection,which are unsuitable for practical application scenar-ios.Therefore,this paper proposes a new helmet-wearing detection algorithm based on the You Only Look Once version 5(YOLOv5).First,the Dilated convolution In Coordinate Attention(DICA)layer is added to the backbone network.DICA combines the coordinated attention mechanism with atrous convolution to replace the original convolution layer,which can increase the perceptual field of the network to get more contextual information.Also,it can reduce the network’s learning of unnecessary features in the background and get attention to small objects.Second,the Rebuild Bidirectional Feature Pyramid Network(Re-BiFPN)is used as a feature extraction network.Re-BiFPN uses cross-scale feature fusion to combine the semantic information features at the high level with the spatial information features at the bottom level,which facilitates the model to learn object features at different scales.Verified on the proposed“Helmet Wearing dataset for Non-motor Drivers(HWND),”the results show that the proposed model is superior to the current detection algorithms,with the mean average precision(mAP)of 94.3%under complex background.
文摘A face-mask object detection model incorporating hybrid dilation convolutional network termed ResNet Hybrid-dilation-convolution Face-mask-detector (RHF) is proposed in this paper. Furthermore, a lightweight face-mask dataset named Light Masked Face Dataset (LMFD) and a medium-sized face-mask dataset named Masked Face Dataset (MFD) with data augmentation methods applied is also constructed in this paper. The hybrid dilation convolutional network is able to expand the perception of the convolutional kernel without concern about the discontinuity of image information during the convolution process. For the given two datasets being constructed above, the trained models are significantly optimized in terms of detection performance, training time, and other related metrics. By using the MFD dataset of 55,905 images, the RHF model requires roughly 10 hours less training time compared to ResNet50 with better detection results with mAP of 93.45%.
基金Project supported by the National Natural Science Foundation of China(Grant No.61872204)the Natural Science Fund of Heilongjiang Province,China(Grant No.F2017029)+1 种基金the Scientific Research Project of Heilongjiang Provincial Universities,China(Grant No.135109236)the Graduate Research Project,China(Grant No.YJSCX2019042).
文摘Traditional compressed sensing algorithm is used to reconstruct images by iteratively optimizing a small number of measured values.The computation is complex and the reconstruction time is long.The deep learning-based compressed sensing algorithm can greatly shorten the reconstruction time,but the algorithm emphasis is placed on reconstructing the network part mostly.The random measurement matrix cannot measure the image features well,which leads the reconstructed image quality to be improved limitedly.Two kinds of networks are proposed for solving this problem.The first one is ReconNet’s improved network IReconNet,which replaces the traditional linear random measurement matrix with an adaptive nonlinear measurement network.The reconstruction quality and anti-noise performance are greatly improved.Because the measured values extracted by the measurement network also retain the characteristics of image spatial information,the image is reconstructed by bilinear interpolation algorithm(Bilinear)and dilate convolution.Therefore a second network USDCNN is proposed.On the BSD500 dataset,the sampling rates are 0.25,0.10,0.04,and 0.01,the average peak signal-noise ratio(PSNR)of USDCNN is 1.62 dB,1.31 dB,1.47 dB,and 1.95 dB higher than that of MSRNet.Experiments show the average reconstruction time of USDCNN is 0.2705 s,0.3671 s,0.3602 s,and 0.3929 s faster than that of ReconNet.Moreover,there is also a great advantage in anti-noise performance.
基金This research is supported by the Ministry of Education Saudi Arabia under Project Number QURDO001.
文摘With the rapid progress of deep convolutional neural networks,several applications of crowd counting have been proposed and explored in the literature.In congested scene monitoring,a variety of crowd density estimating approaches has been developed.The understanding of highly congested scenes for crowd counting during Muslim gatherings of Hajj and Umrah is a challenging task,as a large number of individuals stand nearby and,it is hard for detection techniques to recognize them,as the crowd can vary from low density to high density.To deal with such highly congested scenes,we have proposed the Congested Scene Crowd Counting Network(CSCC-Net)using VGG-16 as a core network with its first ten layers due to its strong and robust transfer learning rate.A hole dilated convolutional neural network is used at the back end to widen the relevant field to extract a large range of information from the image without losing its original resolution.The dilated convolution neural network is mainly chosen to expand the kernel size without changing other parameters.Moreover,several loss functions have been applied to strengthen the evaluation accuracy of the model.Finally,the entire experiments have been evaluated using prominent data sets namely,ShanghaiTech parts A,B,UCF_CC_50,and UCF_QNRF.Our model has achieved remarkable results i.e.,68.0 and 9.0 MAE on ShanghaiTech parts A,B,199.1 MAE on UCF_CC_50,and 99.8 on UCF_QNRF data sets respectively.
基金sponsored by the Special Project of China Earthquake Administration(ZX1903006)Earthquake Science Spark Program of China Earthquake Administration(XH16037)Science and Technology Program of Gansu Province(17JR5RA338)。
文摘The deep learning method has made nurnerials achievements regarding anomaly detection in the field of time series.We introduce the speech production model in the field of artificial intelligence,changing the convolution layer of the general convolution neural network to the residual element structure by adding identity mapping,and expanding the receptive domain of the model by using the dilated causal convolution.Based on the dilated causal convolution network and the method of log probability density function,the anomalous events are detected according to the anomaly scores.The validity of the method is verified by the simulation data,which is applied to the actual observed data on the observation staion of Pingliang geoeletric field in Gansu Province.The results show that one month before the Wenchuan M_S8.0,Lushan M_S7.0 and Minxian-Zhangxian M_S6.6 earthquakes,the daily cumulative error of log probability density of the predicted results in Pingliang Station suddenly decreases,which is consistent with the actual earthquake anomalies in a certain time range.After analyzing the combined factors including the spatial electromagnetic environment and the variation of micro fissures before the earthquake,we explain the possible causes of the anomalies in the geoelectric field of before the earthquake.The successful application of deep learning in observed data of the geoelectric field may behefit for improving the ultilization rate both the data and the efficiency of detection.Besides,it may provide technical support for more seismic research.
基金Supported by Sichuan Science and Technology Program(2023YFSY0026,2023YFH0004)Supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.RS-2022-00155885,Artificial Intelligence Convergence Innovation Human Resources Development(Hanyang University ERICA)).
文摘Two-dimensional endoscopic images are susceptible to interferences such as specular reflections and monotonous texture illumination,hindering accurate three-dimensional lesion reconstruction by surgical robots.This study proposes a novel end-to-end disparity estimation model to address these challenges.Our approach combines a Pseudo-Siamese neural network architecture with pyramid dilated convolutions,integrating multi-scale image information to enhance robustness against lighting interferences.This study introduces a Pseudo-Siamese structure-based disparity regression model that simplifies left-right image comparison,improving accuracy and efficiency.The model was evaluated using a dataset of stereo endoscopic videos captured by the Da Vinci surgical robot,comprising simulated silicone heart sequences and real heart video data.Experimental results demonstrate significant improvement in the network’s resistance to lighting interference without substantially increasing parameters.Moreover,the model exhibited faster convergence during training,contributing to overall performance enhancement.This study advances endoscopic image processing accuracy and has potential implications for surgical robot applications in complex environments.
基金This research project was partially supported by the National Natural Science Foundation of China(Grant Nos.62072015,U19B2039,U1811463)the National Key R&D Program of China(Grant No.2018YFB1600903).
文摘Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.