With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural network...With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural networks.These intelligent and automated methods significantly reduce manual labor,particularly in the laborious task of manually labeling seismic facies.However,the extensive demand for training data imposes limitations on their wider application.To overcome this challenge,we adopt the UNet architecture as the foundational network structure for seismic facies classification,which has demonstrated effective segmentation results even with small-sample training data.Additionally,we integrate spatial pyramid pooling and dilated convolution modules into the network architecture to enhance the perception of spatial information across a broader range.The seismic facies classification test on the public data from the F3 block verifies the superior performance of our proposed improved network structure in delineating seismic facies boundaries.Comparative analysis against the traditional UNet model reveals that our method achieves more accurate predictive classification results,as evidenced by various evaluation metrics for image segmentation.Obviously,the classification accuracy reaches an impressive 96%.Furthermore,the results of seismic facies classification in the seismic slice dimension provide further confirmation of the superior performance of our proposed method,which accurately defines the range of different seismic facies.This approach holds significant potential for analyzing geological patterns and extracting valuable depositional information.展开更多
针对轻量化网络结构从特征图提取有效语义信息不足,以及语义信息与空间细节信息融合模块设计不合理而导致分割精度降低的问题,本文提出一种结合全局注意力机制的实时语义分割网络(global attention mechanism with real time semantic s...针对轻量化网络结构从特征图提取有效语义信息不足,以及语义信息与空间细节信息融合模块设计不合理而导致分割精度降低的问题,本文提出一种结合全局注意力机制的实时语义分割网络(global attention mechanism with real time semantic segmentation network,GaSeNet)。首先在双分支结构的语义分支中引入全局注意力机制,在通道与空间两个维度引导卷积神经网来关注与分割任务相关的语义类别,以提取更多有效语义信息;其次在空间细节分支设计混合空洞卷积块,在卷积核大小不变的情况下扩大感受野,以获取更多全局空间细节信息,弥补关键特征信息损失。然后重新设计特征融合模块,引入深度聚合金塔池化,将不同尺度的特征图深度融合,从而提高网络的语义分割性能。最后将所提出的方法在CamVid数据集和Vaihingen数据集上进行实验,通过与最新的语义分割方法对比分析可知,GaSeNet在分割精度上分别提高了4.29%、16.06%,实验结果验证了本文方法处理实时语义分割问题的有效性。展开更多
Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of s...Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of small object detection,which are unsuitable for practical application scenar-ios.Therefore,this paper proposes a new helmet-wearing detection algorithm based on the You Only Look Once version 5(YOLOv5).First,the Dilated convolution In Coordinate Attention(DICA)layer is added to the backbone network.DICA combines the coordinated attention mechanism with atrous convolution to replace the original convolution layer,which can increase the perceptual field of the network to get more contextual information.Also,it can reduce the network’s learning of unnecessary features in the background and get attention to small objects.Second,the Rebuild Bidirectional Feature Pyramid Network(Re-BiFPN)is used as a feature extraction network.Re-BiFPN uses cross-scale feature fusion to combine the semantic information features at the high level with the spatial information features at the bottom level,which facilitates the model to learn object features at different scales.Verified on the proposed“Helmet Wearing dataset for Non-motor Drivers(HWND),”the results show that the proposed model is superior to the current detection algorithms,with the mean average precision(mAP)of 94.3%under complex background.展开更多
Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstre...Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.展开更多
基金funded by the Fundamental Research Project of CNPC Geophysical Key Lab(2022DQ0604-4)the Strategic Cooperation Technology Projects of China National Petroleum Corporation and China University of Petroleum-Beijing(ZLZX 202003)。
文摘With the successful application and breakthrough of deep learning technology in image segmentation,there has been continuous development in the field of seismic facies interpretation using convolutional neural networks.These intelligent and automated methods significantly reduce manual labor,particularly in the laborious task of manually labeling seismic facies.However,the extensive demand for training data imposes limitations on their wider application.To overcome this challenge,we adopt the UNet architecture as the foundational network structure for seismic facies classification,which has demonstrated effective segmentation results even with small-sample training data.Additionally,we integrate spatial pyramid pooling and dilated convolution modules into the network architecture to enhance the perception of spatial information across a broader range.The seismic facies classification test on the public data from the F3 block verifies the superior performance of our proposed improved network structure in delineating seismic facies boundaries.Comparative analysis against the traditional UNet model reveals that our method achieves more accurate predictive classification results,as evidenced by various evaluation metrics for image segmentation.Obviously,the classification accuracy reaches an impressive 96%.Furthermore,the results of seismic facies classification in the seismic slice dimension provide further confirmation of the superior performance of our proposed method,which accurately defines the range of different seismic facies.This approach holds significant potential for analyzing geological patterns and extracting valuable depositional information.
文摘针对轻量化网络结构从特征图提取有效语义信息不足,以及语义信息与空间细节信息融合模块设计不合理而导致分割精度降低的问题,本文提出一种结合全局注意力机制的实时语义分割网络(global attention mechanism with real time semantic segmentation network,GaSeNet)。首先在双分支结构的语义分支中引入全局注意力机制,在通道与空间两个维度引导卷积神经网来关注与分割任务相关的语义类别,以提取更多有效语义信息;其次在空间细节分支设计混合空洞卷积块,在卷积核大小不变的情况下扩大感受野,以获取更多全局空间细节信息,弥补关键特征信息损失。然后重新设计特征融合模块,引入深度聚合金塔池化,将不同尺度的特征图深度融合,从而提高网络的语义分割性能。最后将所提出的方法在CamVid数据集和Vaihingen数据集上进行实验,通过与最新的语义分割方法对比分析可知,GaSeNet在分割精度上分别提高了4.29%、16.06%,实验结果验证了本文方法处理实时语义分割问题的有效性。
基金funded by Natural Science Foundation of Hunan Province under Grant NO:2021JJ31142,author F.J,http://kjt.hunan.gov.cn/.
文摘Detecting non-motor drivers’helmets has significant implications for traffic control.Currently,most helmet detection methods are susceptible to the complex background and need more accuracy and better robustness of small object detection,which are unsuitable for practical application scenar-ios.Therefore,this paper proposes a new helmet-wearing detection algorithm based on the You Only Look Once version 5(YOLOv5).First,the Dilated convolution In Coordinate Attention(DICA)layer is added to the backbone network.DICA combines the coordinated attention mechanism with atrous convolution to replace the original convolution layer,which can increase the perceptual field of the network to get more contextual information.Also,it can reduce the network’s learning of unnecessary features in the background and get attention to small objects.Second,the Rebuild Bidirectional Feature Pyramid Network(Re-BiFPN)is used as a feature extraction network.Re-BiFPN uses cross-scale feature fusion to combine the semantic information features at the high level with the spatial information features at the bottom level,which facilitates the model to learn object features at different scales.Verified on the proposed“Helmet Wearing dataset for Non-motor Drivers(HWND),”the results show that the proposed model is superior to the current detection algorithms,with the mean average precision(mAP)of 94.3%under complex background.
基金This research project was partially supported by the National Natural Science Foundation of China(Grant Nos.62072015,U19B2039,U1811463)the National Key R&D Program of China(Grant No.2018YFB1600903).
文摘Crowd counting provides an important foundation for public security and urban management.Due to the existence of small targets and large density variations in crowd images,crowd counting is a challenging task.Mainstream methods usually apply convolution neural networks(CNNs)to regress a density map,which requires annotations of individual persons and counts.Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images,but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored.We propose a weakly-supervised method,DTCC,which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting.Its main components include a recursive swin transformer and a multi-level dilated convolution regression head.The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features,including global features.The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module.This module can capture both low-and high-level features simultaneously to enhance the receptive field.In addition,two regression head fusion mechanisms realize dynamic and mean fusion counting.Experiments on four well-known benchmark crowd counting datasets(UCF_CC_50,ShanghaiTech,UCF_QNRF,and JHU-Crowd++)show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.