期刊文献+
共找到770篇文章
< 1 2 39 >
每页显示 20 50 100
An improved deep dilated convolutional neural network for seismic facies interpretation
1
作者 Na-Xia Yang Guo-Fa Li +2 位作者 Ting-Hui Li Dong-Feng Zhao Wei-Wei Gu 《Petroleum Science》 SCIE EI CAS CSCD 2024年第3期1569-1583,共15页
With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural network... With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural networks.These intelligent and automated methods significantly reduce manual labor,particularly in the laborious task of manually labeling seismic facies.However,the extensive demand for training data imposes limitations on their wider application.To overcome this challenge,we adopt the UNet architecture as the foundational network structure for seismic facies classification,which has demonstrated effective segmentation results even with small-sample training data.Additionally,we integrate spatial pyramid pooling and dilated convolution modules into the network architecture to enhance the perception of spatial information across a broader range.The seismic facies classification test on the public data from the F3 block verifies the superior performance of our proposed improved network structure in delineating seismic facies boundaries.Comparative analysis against the traditional UNet model reveals that our method achieves more accurate predictive classification results,as evidenced by various evaluation metrics for image segmentation.Obviously,the classification accuracy reaches an impressive 96%.Furthermore,the results of seismic facies classification in the seismic slice dimension provide further confirmation of the superior performance of our proposed method,which accurately defines the range of different seismic facies.This approach holds significant potential for analyzing geological patterns and extracting valuable depositional information. 展开更多
关键词 Seismic facies interpretation dilated convolution Spatial pyramid pooling Internal feature maps Compound loss function
下载PDF
Convolution-Transformer for Image Feature Extraction
2
作者 Lirong Yin Lei Wang +10 位作者 Siyu Lu Ruiyang Wang Youshuai Yang Bo Yang Shan Liu Ahmed AlSanad Salman A.AlQahtani Zhengtong Yin Xiaolu Li Xiaobing Chen Wenfeng Zheng 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第10期87-106,共20页
This study addresses the limitations of Transformer models in image feature extraction,particularly their lack of inductive bias for visual structures.Compared to Convolutional Neural Networks(CNNs),the Transformers a... This study addresses the limitations of Transformer models in image feature extraction,particularly their lack of inductive bias for visual structures.Compared to Convolutional Neural Networks(CNNs),the Transformers are more sensitive to different hyperparameters of optimizers,which leads to a lack of stability and slow convergence.To tackle these challenges,we propose the Convolution-based Efficient Transformer Image Feature Extraction Network(CEFormer)as an enhancement of the Transformer architecture.Our model incorporates E-Attention,depthwise separable convolution,and dilated convolution to introduce crucial inductive biases,such as translation invariance,locality,and scale invariance,into the Transformer framework.Additionally,we implement a lightweight convolution module to process the input images,resulting in faster convergence and improved stability.This results in an efficient convolution combined Transformer image feature extraction network.Experimental results on the ImageNet1k Top-1 dataset demonstrate that the proposed network achieves better accuracy while maintaining high computational speed.It achieves up to 85.0%accuracy across various model sizes on image classification,outperforming various baseline models.When integrated into the Mask Region-ConvolutionalNeuralNetwork(R-CNN)framework as a backbone network,CEFormer outperforms other models and achieves the highest mean Average Precision(mAP)scores.This research presents a significant advancement in Transformer-based image feature extraction,balancing performance and computational efficiency. 展开更多
关键词 TRANSFORMER E-Attention depth convolution dilated convolution CEFormer
下载PDF
TSCND:Temporal Subsequence-Based Convolutional Network with Difference for Time Series Forecasting
3
作者 Haoran Huang Weiting Chen Zheming Fan 《Computers, Materials & Continua》 SCIE EI 2024年第3期3665-3681,共17页
Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in t... Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN. 展开更多
关键词 DIFFERENCE data prediction time series temporal convolutional network dilated convolution
下载PDF
A Lightweight Convolutional Neural Network with Hierarchical Multi-Scale Feature Fusion for Image Classification
4
作者 Adama Dembele Ronald Waweru Mwangi Ananda Omutokoh Kube 《Journal of Computer and Communications》 2024年第2期173-200,共28页
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso... Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline. 展开更多
关键词 MobileNet Image Classification Lightweight convolutional Neural Network Depthwise dilated Separable convolution Hierarchical Multi-Scale Feature Fusion
下载PDF
Advanced Face Mask Detection Model Using Hybrid Dilation Convolution Based Method 被引量:1
5
作者 Shaohan Wang Xiangyu Wang Xin Guo 《Journal of Software Engineering and Applications》 2023年第1期1-19,共19页
A face-mask object detection model incorporating hybrid dilation convolutional network termed ResNet Hybrid-dilation-convolution Face-mask-detector (RHF) is proposed in this paper. Furthermore, a lightweight face-mask... A face-mask object detection model incorporating hybrid dilation convolutional network termed ResNet Hybrid-dilation-convolution Face-mask-detector (RHF) is proposed in this paper. Furthermore, a lightweight face-mask dataset named Light Masked Face Dataset (LMFD) and a medium-sized face-mask dataset named Masked Face Dataset (MFD) with data augmentation methods applied is also constructed in this paper. The hybrid dilation convolutional network is able to expand the perception of the convolutional kernel without concern about the discontinuity of image information during the convolution process. For the given two datasets being constructed above, the trained models are significantly optimized in terms of detection performance, training time, and other related metrics. By using the MFD dataset of 55,905 images, the RHF model requires roughly 10 hours less training time compared to ResNet50 with better detection results with mAP of 93.45%. 展开更多
关键词 Face Mask Detection Object Detection Hybrid dilation convolution Computer Vision
下载PDF
DcNet: Dilated Convolutional Neural Networks for Side-Scan Sonar Image Semantic Segmentation 被引量:2
6
作者 ZHAO Xiaohong QIN Rixia +3 位作者 ZHANG Qilei YU Fei WANG Qi HE Bo 《Journal of Ocean University of China》 SCIE CAS CSCD 2021年第5期1089-1096,共8页
In ocean explorations,side-scan sonar(SSS)plays a very important role and can quickly depict seabed topography.As-sembling the SSS to an autonomous underwater vehicle(AUV)and performing semantic segmentation of an SSS... In ocean explorations,side-scan sonar(SSS)plays a very important role and can quickly depict seabed topography.As-sembling the SSS to an autonomous underwater vehicle(AUV)and performing semantic segmentation of an SSS image in real time can realize online submarine geomorphology or target recognition,which is conducive to submarine detection.However,because of the complexity of the marine environment,various noises in the ocean pollute the sonar image,which also encounters the intensity inhomogeneity problem.In this paper,we propose a novel neural network architecture named dilated convolutional neural network(DcNet)that can run in real time while addressing the above-mentioned issues and providing accurate semantic segmentation.The proposed architecture presents an encoder-decoder network to gradually reduce the spatial dimension of the input image and recover the details of the target,respectively.The core of our network is a novel block connection named DCblock,which mainly uses dilated convolution and depthwise separable convolution between the encoder and decoder to attain more context while still retaining high accuracy.Furthermore,our proposed method performs a super-resolution reconstruction to enlarge the dataset with high-quality im-ages.We compared our network to other common semantic segmentation networks performed on an NVIDIA Jetson TX2 using our sonar image datasets.Experimental results show that while the inference speed of the proposed network significantly outperforms state-of-the-art architectures,the accuracy of our method is still comparable,which indicates its potential applications not only in AUVs equipped with SSS but also in marine exploration. 展开更多
关键词 side-scan sonar(SSS) semantic segmentation dilated convolutions SUPER-RESOLUTION
下载PDF
Long Text Classification Algorithm Using a Hybrid Model of Bidirectional Encoder Representation from Transformers-Hierarchical Attention Networks-Dilated Convolutions Network 被引量:1
7
作者 ZHAO Yuanyuan GAO Shining +1 位作者 LIU Yang GONG Xiaohui 《Journal of Donghua University(English Edition)》 CAS 2021年第4期341-350,共10页
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo... Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model. 展开更多
关键词 long text classification dilated convolution BERT fusing context semantic features hierarchical characteristics BERT_HAN_DCN AM-softmax
下载PDF
Multi⁃Scale Dilated Convolutional Neural Network for Hyperspectral Image Classification
8
作者 Shanshan Zheng Wen Liu +3 位作者 Rui Shan Jingyi Zhao Guoqian Jiang Zhi Zhang 《Journal of Harbin Institute of Technology(New Series)》 CAS 2021年第4期25-32,共8页
Aiming at the problem of image information loss,dilated convolution is introduced and a novel multi⁃scale dilated convolutional neural network(MDCNN)is proposed.Dilated convolution can polymerize image multi⁃scale inf... Aiming at the problem of image information loss,dilated convolution is introduced and a novel multi⁃scale dilated convolutional neural network(MDCNN)is proposed.Dilated convolution can polymerize image multi⁃scale information without reducing the resolution.The first layer of the network used spectral convolutional step to reduce dimensionality.Then the multi⁃scale aggregation extracted multi⁃scale features through applying dilated convolution and shortcut connection.The extracted features which represent properties of data were fed through Softmax to predict the samples.MDCNN achieved the overall accuracy of 99.58% and 99.92% on two public datasets,Indian Pines and Pavia University.Compared with four other existing models,the results illustrate that MDCNN can extract better discriminative features and achieve higher classification performance. 展开更多
关键词 multi⁃scale aggregation dilated convolution hyperspectral image classification(HSIC) shortcut connection
下载PDF
Multi-Classification of Polyps in Colonoscopy Images Based on an Improved Deep Convolutional Neural Network 被引量:1
9
作者 Shuang Liu Xiao Liu +9 位作者 Shilong Chang Yufeng Sun Kaiyuan Li Ya Hou Shiwei Wang Jie Meng Qingliang Zhao Sibei Wu Kun Yang Linyan Xue 《Computers, Materials & Continua》 SCIE EI 2023年第6期5837-5852,共16页
Achieving accurate classification of colorectal polyps during colonoscopy can avoid unnecessary endoscopic biopsy or resection.This study aimed to develop a deep learning model that can automatically classify colorect... Achieving accurate classification of colorectal polyps during colonoscopy can avoid unnecessary endoscopic biopsy or resection.This study aimed to develop a deep learning model that can automatically classify colorectal polyps histologically on white-light and narrow-band imaging(NBI)colonoscopy images based on World Health Organization(WHO)and Workgroup serrAted polypS and Polyposis(WASP)classification criteria for colorectal polyps.White-light and NBI colonoscopy images of colorectal polyps exhibiting pathological results were firstly collected and classified into four categories:conventional adenoma,hyperplastic polyp,sessile serrated adenoma/polyp(SSAP)and normal,among which conventional adenoma could be further divided into three sub-categories of tubular adenoma,villous adenoma and villioustublar adenoma,subsequently the images were re-classified into six categories.In this paper,we proposed a novel convolutional neural network termed Polyp-DedNet for the four-and six-category classification tasks of colorectal polyps.Based on the existing classification network ResNet50,Polyp-DedNet adopted dilated convolution to retain more high-dimensional spatial information and an Efficient Channel Attention(ECA)module to improve the classification performance further.To eliminate gridding artifacts caused by dilated convolutions,traditional convolutional layers were used instead of the max pooling layer,and two convolutional layers with progressively decreasing dilation were added at the end of the network.Due to the inevitable imbalance of medical image data,a regularization method DropBlock and a Class-Balanced(CB)Loss were performed to prevent network overfitting.Furthermore,the 5-fold cross-validation was adopted to estimate the performance of Polyp-DedNet for the multi-classification task of colorectal polyps.Mean accuracies of the proposed Polyp-DedNet for the four-and six-category classifications of colorectal polyps were 89.91%±0.92%and 85.13%±1.10%,respectively.The metrics of precision,recall and F1-score were also improved by 1%∼2%compared to the baseline ResNet50.The proposed Polyp-DedNet presented state-of-the-art performance for colorectal polyp classifying on white-light and NBI colonoscopy images,highlighting its considerable potential as an AI-assistant system for accurate colorectal polyp diagnosis in colonoscopy. 展开更多
关键词 Colorectal polyps four-and six-category classifications convolutional neural network dilated residual network
下载PDF
1D-CNN:Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features 被引量:6
10
作者 Mustaqeem Soonil Kwon 《Computers, Materials & Continua》 SCIE EI 2021年第6期4039-4059,共21页
Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Re... Emotion recognition from speech data is an active and emerging area of research that plays an important role in numerous applications,such as robotics,virtual reality,behavior assessments,and emergency call centers.Recently,researchers have developed many techniques in this field in order to ensure an improvement in the accuracy by utilizing several deep learning approaches,but the recognition rate is still not convincing.Our main aim is to develop a new technique that increases the recognition rate with reasonable cost computations.In this paper,we suggested a new technique,which is a one-dimensional dilated convolutional neural network(1D-DCNN)for speech emotion recognition(SER)that utilizes the hierarchical features learning blocks(HFLBs)with a bi-directional gated recurrent unit(BiGRU).We designed a one-dimensional CNN network to enhance the speech signals,which uses a spectral analysis,and to extract the hidden patterns from the speech signals that are fed into a stacked one-dimensional dilated network that are called HFLBs.Each HFLB contains one dilated convolution layer(DCL),one batch normalization(BN),and one leaky_relu(Relu)layer in order to extract the emotional features using a hieratical correlation strategy.Furthermore,the learned emotional features are feed into a BiGRU in order to adjust the global weights and to recognize the temporal cues.The final state of the deep BiGRU is passed from a softmax classifier in order to produce the probabilities of the emotions.The proposed model was evaluated over three benchmarked datasets that included the IEMOCAP,EMO-DB,and RAVDESS,which achieved 72.75%,91.14%,and 78.01%accuracy,respectively. 展开更多
关键词 Affective computing one-dimensional dilated convolutional neural network emotion recognition gated recurrent unit raw audio clips
下载PDF
DTCC:Multi-level dilated convolution with transformer for weakly-supervised crowd counting
11
作者 Zhuangzhuang Miao Yong Zhang +2 位作者 Yuan Peng Haocheng Peng Baocai Yin 《Computational Visual Media》 SCIE EI CSCD 2023年第4期859-873,共15页
Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstre... Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods. 展开更多
关键词 crowd counting TRANSFORMER dilated convolution global perspective field PYRAMID
原文传递
Hard-rock tunnel lithology identification using multiscale dilated convolutional attention network based on tunnel face images
12
作者 Wenjun ZHANG Wuqi ZHANG +5 位作者 Gaole ZHANG Jun HUANG Minggeng LI Xiaohui WANG Fei YE Xiaoming GUAN 《Frontiers of Structural and Civil Engineering》 SCIE EI CSCD 2023年第12期1796-1812,共17页
For real-time classification of rock-masses in hard-rock tunnels,quick determination of the rock lithology on the tunnel face during construction is essential.Motivated by current breakthroughs in artificial intellige... For real-time classification of rock-masses in hard-rock tunnels,quick determination of the rock lithology on the tunnel face during construction is essential.Motivated by current breakthroughs in artificial intelligence technology in machine vision,a new automatic detection approach for classifying tunnel lithology based on tunnel face images was developed.The method benefits from residual learning for training a deep convolutional neural network(DCNN),and a multi-scale dilated convolutional attention block is proposed.The block with different dilation rates can provide various receptive fields,and thus it can extract multi-scale features.Moreover,the attention mechanism is utilized to select the salient features adaptively and further improve the performance of the model.In this study,an initial image data set made up of photographs of tunnel faces consisting of basalt,granite,siltstone,and tuff was first collected.After classifying and enhancing the training,validation,and testing data sets,a new image data set was generated.A comparison of the experimental findings demonstrated that the suggested approach outperforms previous classifiers in terms of various indicators,including accuracy,precision,recall,F1-score,and computing time.Finally,a visualization analysis was performed to explain the process of the network in the classification of tunnel lithology through feature extraction.Overall,this study demonstrates the potential of using artificial intelligence methods for in situ rock lithology classification utilizing geological images of the tunnel face. 展开更多
关键词 hard-rock tunnel face intelligent lithology identification multi-scale dilated convolutional attention network image classification deep learning
原文传递
数据驱动的半无限介质裂纹识别模型研究 被引量:1
13
作者 江守燕 邓王涛 +1 位作者 孙立国 杜成斌 《力学学报》 EI CAS CSCD 北大核心 2024年第6期1727-1739,共13页
缺陷识别是结构健康监测的重要研究内容,对评估工程结构的安全性具有重要的指导意义,然而,准确确定结构缺陷的尺寸十分困难.论文提出了一种创新的数据驱动算法,将比例边界有限元法(scaled boundary finite element methods,SBFEM)与自... 缺陷识别是结构健康监测的重要研究内容,对评估工程结构的安全性具有重要的指导意义,然而,准确确定结构缺陷的尺寸十分困难.论文提出了一种创新的数据驱动算法,将比例边界有限元法(scaled boundary finite element methods,SBFEM)与自编码器(autoencoder,AE)、因果膨胀卷积神经网络(causal dilated convolutional neural network,CDCNN)相结合用于半无限介质中的裂纹识别.在该模型中,SBFEM用于模拟波在含不同裂纹状缺陷半无限介质中的传播过程,对于不同的裂纹状缺陷,仅需改变裂纹尖端的比例中心和裂纹开口处节点的位置,避免了复杂的重网格过程,可高效地生成足够的训练数据.模拟波在半无限介质中传播时,建立了基于瑞利阻尼的吸收边界模型,避免了对结构全域模型进行计算.搭建了CDCNN,确保了时序数据的有序性,并获得更大的感受野而不增加神经网络的复杂性,可捕捉更多的历史信息,AE具有较强的非线性特征提取能力,可将高维的原始输入特征向量空间映射到低维潜在特征向量空间,以获得低维潜在特征用于网络模型训练,有效提升了网络模型的学习效率.数值算例表明:提出的模型能够高效且准确地识别半无限介质中裂纹的量化信息,且AE-CDCNN模型的识别效率较单CDCNN模型提高了约2.7倍. 展开更多
关键词 数据驱动 比例边界有限元法 自编码器 因果膨胀卷积神经网络 裂纹识别
下载PDF
一种道路裂缝检测的变尺度VS-UNet模型 被引量:1
14
作者 赵志宏 何朋 郝子晔 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2024年第6期63-72,共10页
为解决目前现有的图像分割算法存在检测精度低、对裂缝检测缺乏针对性等问题,采用多尺度特征融合方法,提出一种扩展LG Block模块Extend-LG Block,其由多个并行不同膨胀率的空洞卷积组成.通过参数可调节分支数量和空洞卷积膨胀率,从而改... 为解决目前现有的图像分割算法存在检测精度低、对裂缝检测缺乏针对性等问题,采用多尺度特征融合方法,提出一种扩展LG Block模块Extend-LG Block,其由多个并行不同膨胀率的空洞卷积组成.通过参数可调节分支数量和空洞卷积膨胀率,从而改变其感受野大小,进而提取和融合不同尺度的裂缝特征.对比在深层使用多尺度特征融合模块的网络以及使用固定尺度结构进行多尺度特征融合的网络的优劣,提出一种变尺度结构的UNet模型VS-UNet,使用多个不同参数的Extend-LG Block替换UNet网络中的基本卷积块.该结构在网络浅层进行多尺度特征融合,多尺度特征融合模块提取的尺度随网络层加深逐渐减少.此结构在加强图像的细节特征提取能力的同时保持原有的抽象特征提取能力,还可避免网络参数的增加.在DeepCrack数据集以及CFD数据集上进行实验验证,结果表明,相较于其他两种结构和方法,提出的变尺度结构的网络在有更高检测精度的同时,在可视化实验对比上对各种大小的裂缝有更好的分割效果.最后与其他图像分割算法进行对比,各项指标与UNet相比均有一定程度提升,证明了网络改进的有效性.研究结果可为进一步提升道路裂缝检测效果提供参考. 展开更多
关键词 U-Net 多尺度 裂缝检测 空洞卷积 深度学习
下载PDF
基于交叉注意力的多任务交通场景检测模型 被引量:1
15
作者 牛国臣 王晓楠 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2024年第5期1491-1499,共9页
感知是自动驾驶的基础和关键,但大多数单个模型无法同时完成交通目标、可行驶区域和车道线等多项检测任务。提出一种基于交叉注意力的多任务交通场景检测模型,可以同时检测交通目标、可行驶区域和车道线。使用编解码网络提取初始特征,... 感知是自动驾驶的基础和关键,但大多数单个模型无法同时完成交通目标、可行驶区域和车道线等多项检测任务。提出一种基于交叉注意力的多任务交通场景检测模型,可以同时检测交通目标、可行驶区域和车道线。使用编解码网络提取初始特征,利用混合空洞卷积对初始特征进行强化,并通过交叉注意力模块得到分割和检测特征图。在分割特征图上进行语义分割,在检测特征图上进行目标检测。实验结果表明:在具有挑战性的BDD100K数据集中,所提模型在任务精度和总体计算效率方面优于其他多任务模型。 展开更多
关键词 注意力机制 多任务学习 自动驾驶 目标检测 混合空洞卷积
下载PDF
基于深度学习的煤岩Micro-CT裂隙智能提取与应用
16
作者 王登科 房禹 +8 位作者 魏建平 张宏图 赵立桢 王龙航 夏缘帝 李璐 王少璞 张强 任海慧 《煤炭学报》 EI CAS CSCD 北大核心 2024年第8期3439-3452,共14页
为解决煤岩CT裂隙图像识别中矸石影响以及不同尺度裂隙识别的问题,设计并实现了一种基于深度学习的煤岩裂隙提取网络模型(MCSN),该模型基于U-Net网络,利用其编码器-解码器结构和跳跃连接,可实现从复杂煤岩体中分割出完整的裂隙结构图像... 为解决煤岩CT裂隙图像识别中矸石影响以及不同尺度裂隙识别的问题,设计并实现了一种基于深度学习的煤岩裂隙提取网络模型(MCSN),该模型基于U-Net网络,利用其编码器-解码器结构和跳跃连接,可实现从复杂煤岩体中分割出完整的裂隙结构图像。首先,通过煤岩工业CT扫描系统获取煤岩体内部扫描图片后,人工标注出CT图像中的裂隙结构,并利用数据增强扩充标注的原始数据制作出煤岩CT裂隙数据集;然后,将训练好的VGG16模型权重通过迁移学习技术移至U-Net编码器部分,使得整个主干特征提取网络具有更强的裂隙结构特征提取能力;同时采用深度可分离空洞卷积模块(DCAC)和残差模块对U-Net模型中解码器部分进行改进,有效提升了CT图像中裂隙结构的识别能力,展现出了优越的分割精度和鲁棒性。为验证提出的煤岩裂隙提取网络模型的有效性,将MCSN的提取结果与经典的卷积神经网络及阈值分割方法的结果进行了对比,实验对比结果显示,提出的模型在定性分析和定量分析方面优势明显。这种多尺度融合的策略可以有效提取出复杂煤岩体图像中的裂隙,提高了裂隙识别效率和精度。将该模型应用到巷道围岩钻孔裂隙识别中,通过对钻孔成像仪采集到的窥孔视频和平面展开图进行裂隙提取,并结合二者提取结果进行交叉验证,得到了精准的巷道围岩裂隙分布范围,给出了穿层抽采钻孔的注浆封孔范围,提高了煤层瓦斯抽采体积分数。 展开更多
关键词 裂隙识别与提取 CT扫描 深度学习 卷积神经网络 空洞卷积
下载PDF
基于轻量化的DeepLabV3+遥感图像地物分割方法
17
作者 马静 郭中华 +2 位作者 马志强 马小艳 李迦龙 《液晶与显示》 CAS CSCD 北大核心 2024年第8期1001-1013,共13页
针对DeepLabV3+在遥感图像地物分割中出现的细节信息丢失、类别不均衡等问题引起的误差,提出一种基于轻量化网络的DeepLabV3+遥感图像地物分割方法。首先,使用MobileNetV2替换原始基准网络中的骨干网络,提高训练效率并减少模型的复杂度... 针对DeepLabV3+在遥感图像地物分割中出现的细节信息丢失、类别不均衡等问题引起的误差,提出一种基于轻量化网络的DeepLabV3+遥感图像地物分割方法。首先,使用MobileNetV2替换原始基准网络中的骨干网络,提高训练效率并减少模型的复杂度。其次,增大ASPP结构中空洞卷积的膨胀率,并在ASPP最后一层使用最大池化,有效地捕获不同尺度的上下文信息,同时在ASPP每个分支中引入SE注意力机制,并在提取浅层特征之后引入ECA注意力机制,提高模型对不同类别和细节的感知能力。最后,使用加权的Dice-Focal联合损失函数进行优化,处理类别不均衡的问题。将改进的模型分别在CCF和华为昇腾杯竞赛数据集上进行验证,实验结果表明,本文所提出的方法相较于原始DeepLabV3+模型在两种测试集上的各个指标均有不同程度的提高。其中,mIoU达到了73.47%、63.43%,分别提高了3.24%和15.11%;准确率达到了88.28%、86.47%,分别提高了1.47%和7.83%;F1指数达到了84.29%和77.04%,分别提高了3.86%和13.46%。改进后的DeepLabV3+模型可以更好地解决细节信息丢失和类别不均衡的问题,提高遥感图像地物分割的性能和准确性。 展开更多
关键词 MobileNetV2 空洞卷积 注意力机制 损失函数
下载PDF
基于混合分组扩张卷积的玉米植株图像深度估计
18
作者 周云成 刘忠颖 +2 位作者 邓寒冰 苗腾 王昌远 《华南农业大学学报》 CSCD 北大核心 2024年第2期280-292,共13页
【目的】研究面向玉米田间场景的图像深度估计方法,解决深度估计模型因缺少有效光度损失度量而易产生的精度不足问题,为田间智能农业机械视觉系统设计及导航避障等提供技术支持。【方法】应用双目相机作为视觉传感器,提出一种基于混合... 【目的】研究面向玉米田间场景的图像深度估计方法,解决深度估计模型因缺少有效光度损失度量而易产生的精度不足问题,为田间智能农业机械视觉系统设计及导航避障等提供技术支持。【方法】应用双目相机作为视觉传感器,提出一种基于混合分组扩张卷积的无监督场景深度估计模型。设计一种混合分组扩张卷积结构及对应的自注意力机制,由此构建反向残差模块和深度估计骨干网络;并将光照不敏感的图像梯度和Gabor纹理特征引入视图表观差异度量,构建模型优化目标。以田间玉米植株图像深度估计为例,开展模型的训练和测试试验。【结果】与固定扩张因子相比,采用混合分组扩张卷积使田间玉米植株深度估计平均相对误差降低了63.9%,平均绝对误差和均方根误差则分别降低32.3%和10.2%,模型精度显著提高;图像梯度、Gabor纹理特征和自注意力机制的引入,使田间玉米植株深度估计平均绝对误差和均方根误差进一步降低3.2%和4.6%。增加浅层编码器的网络宽度和深度可显著提高模型深度估计精度,但该处理对深层编码器的作用不明显。该研究设计的自注意力机制对编码器浅层反向残差模块中不同扩张因子的卷积分组体现出选择性,说明该机制具有自主调节感受野的能力。与Monodepth2相比,该研究模型田间玉米植株深度估计的平均相对误差降低48.2%,平均绝对误差降低17.1%;在20 m采样范围内,估计深度的平均绝对误差小于16 cm,计算速度为14.3帧/s。【结论】基于混合分组扩张卷积的图像深度估计模型优于现有方法,有效提升了深度估计的精度,能够满足田间玉米植株图像的深度估计要求。 展开更多
关键词 深度估计 扩张卷积 自注意力 无监督学习 玉米植株图像
下载PDF
基于多尺度感知的密集人群计数网络
19
作者 李恒超 刘香莲 +1 位作者 刘鹏 冯斌 《西南交通大学学报》 EI CSCD 北大核心 2024年第5期1176-1183,1214,共9页
针对密集人群场景存在的目标尺度多样、人群大尺度变化等问题,提出一种基于多尺度感知的密集人群计数网络.首先,考虑到小尺度目标在图像中占比较大,以VGG-16 (visual geometry group 2016)网络为基础,引入空洞卷积模块,以挖掘图像细节信... 针对密集人群场景存在的目标尺度多样、人群大尺度变化等问题,提出一种基于多尺度感知的密集人群计数网络.首先,考虑到小尺度目标在图像中占比较大,以VGG-16 (visual geometry group 2016)网络为基础,引入空洞卷积模块,以挖掘图像细节信息;其次,为充分利用目标多尺度信息,构建新的上下文感知模块,以提取不同尺度之间的对比特征;最后,考虑到目标尺度连续变化的特点,设计多尺度特征聚合模块,提高密集尺度采样范围与多尺度信息交互,从而提升网络性能.实验结果显示:在ShangHai Tech (Part_A/Part_B)和UCF_CC_50数据集上,本文方法的平均绝对误差(mean absolute error,MAE)分别为62.5、6.9、156.5,均方根误差(root mean square error,RMSE)分别为95.7、11.0、223.3;相较于最优对比方法,在UCF_QNRF数据集上的MAE和RMSE分别降低1.1%和4.3%,在NWPU数据集上分别降低8.7%和13.9%. 展开更多
关键词 人群密度估计 多尺度聚合 空洞卷积 密度图
下载PDF
基于多核扩展卷积的无监督视频行人重识别
20
作者 刘仲民 张长凯 胡文瑾 《数据采集与处理》 CSCD 北大核心 2024年第5期1192-1203,共12页
行人重识别旨在跨监控摄像头下检索出特定的行人目标。由于存在姿态变化、物体遮挡和背景干扰的不同成像条件等问题,导致行人特征提取不充分。本文提出一种利用多核扩展卷积的无监督视频行人重识别方法,使得提取到的行人特征能够更全面... 行人重识别旨在跨监控摄像头下检索出特定的行人目标。由于存在姿态变化、物体遮挡和背景干扰的不同成像条件等问题,导致行人特征提取不充分。本文提出一种利用多核扩展卷积的无监督视频行人重识别方法,使得提取到的行人特征能够更全面、更准确地表达个体差异和特征信息。首先,采用预训练的ResNet50作为编码器,为了进一步提升编码器的特征提取能力,引入了多核扩展卷积模块,通过增加卷积核的感受野,使得网络能够更有效地捕获到局部和全局的特征信息,从而更全面地描述行人的外貌特征;其次,通过解码器将高级语义信息还原为更为底层的特征表示,从而增强特征表示,提高系统在复杂成像条件下的性能;最后,在解码器的输出中引入多尺度特征融合模块融合相邻层中的特征,进一步减少不同特征通道层之间的语义差距,以产生更鲁棒的特征表示。在3个主流数据集上进行离线实验,结果表明该方法在准确性和鲁棒性上均取得了显著的改进。 展开更多
关键词 行人重识别 多核扩展卷积 无监督学习 特征提取 注意力机制
下载PDF
上一页 1 2 39 下一页 到第
使用帮助 返回顶部