期刊文献+

面向图像数据的ConvNeXt特征提取研究

ConvNeXt Feature Extraction Study for Image Data
下载PDF
导出
摘要 卷积神经网络在计算机视觉任务中已取得诸多成果,无论是目标检测还是分割,都依赖于提取到的特征信息,一些模糊性的数据和物体形状各异等问题为特征提取带来了极大的挑战。传统的卷积结构只能学习到特征图相邻空间位置的上下文信息,无法对全局信息进行提取,而自注意力机制等模型虽具有更大的感受野和建立全局的依赖关系,但存在计算复杂度过高和需要大量数据等不足。为此,提出了一种CNN与LSTM结合的模型,该模型在增强局部感受野的前提下,可以更好地结合图像数据的全局信息。研究以主干网络ConvNeXt-T为基础模型,通过拼接不同大小卷积核以融合多尺度特征来解决物体形状各异的问题,并从水平和垂直两个方向聚合双向长短期记忆网络关注全局与局部信息的交互性。实验对公开访问的CIFAR-10,CIFAR-100,Tiny ImageNet数据集进行图像分类任务,所提出的网络在3个数据集实验中相较于基础模型ConvNeXt-T在准确率上分别提高了3.18%,2.91%,1.03%。实验证明改进后的ConvNeXt-T网络相较于基础模型在参数量和准确性方面都有了大幅度提升,可提取到更加有效的特征信息。 Convolutional neural networks have achieved many results in computer vision tasks,both in target detection and segmentation,which depend on the extracted feature information.Some problems such as ambiguous data and varying object shapes pose great challenges for feature extraction.The traditional convolutional structure can only learn the contextual information of the neighboring spatial locations of the feature map and cannot extract the global information,while models such as the self-attentive mechanism,although having a larger perceptual field and establishing global dependencies,are insufficient due to their high computational complexity and the need for large amounts of data.Therefore,this paper proposes a model combining CNN and LSTM,which can better combine the global information of image data while enhancing the local perceptual field.It uses the backbone network ConvNeXt-T as the base model to solve the problem of different object shapes by splicing different size convolutional kernels to fuse multi-scale features,and aggregates two-way long and short-term memory networks from both horizontal and vertical directions.Focus on the interactivity of global and local information.Experiments are conducted on publicly accessible CIFAR-10,CIFAR-100,and Tiny ImageNet datasets for image classification tasks,and the accuracy of the proposed network improves 3.18%,2.91%,and 1.03%in the three datasets respectively,compared to the base model ConvNeXt-T.Experiments demonstrate that the improved ConvNeXt-T network has substantially improved the number of parameters and accuracy compared with the base model,and can extract more effective feature information.
作者 杨鹏跃 王锋 魏巍 YANG Pengyue;WANG Feng;WEI Wei(School of Computer&Information Technology,Shanxi University,Taiyuan 030006,China)
出处 《计算机科学》 CSCD 北大核心 2024年第S01期283-289,共7页 Computer Science
基金 国家自然科学基金(62276158) 山西省回国留学人员科研资助项目(2021-007)。
关键词 特征提取 局部感受野 ConvNeXt-T 多尺度特征 双向长短期记忆网络 Feature extraction Local receptive field ConvNeXt-T Multi-scale features Bidirectional long and short-term memory network
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部