基于类别查询的视觉Transformer研究

Research on Visual Transformers based on Class Queries

下载PDF

导出

摘要近年来,Transformer已逐渐成为计算机视觉领域的主流架构。其远程表达能力和高并行性赋予了它在性能上与卷积神经网络相媲美的能力。然而,在当前阶段,将注意力机制应用于计算机视觉仍存在两个主要问题:一是计算复杂度过高;二是需要大量的训练数据。为解决这些问题,提出一种基于类别查询的视觉Transformer模型(OB_ViT)。创新之处主要体现在以下两个方面:一是引入可学习的类别查询;二是采用基于匈牙利算法的损失函数。具体而言,一种可学习的类别查询作为解码器的输入,通过此方法,可以对目标类别与全局图像上下文之间的关系进行推理。此外,通过采用匈牙利算法强制实现唯一预测,确保每个类别查询仅学习一种目标类别。在Cifar10和5分类Flower数据集上的图像分类实验表明,与ViT和Resnet50相比,OB_ViT模型在参数量减少的同时,学习准确率显著提高。例如,在Cifar10数据集上,参数量减少15%,准确率提升22%。 In recent years,Transformer has gradually become the mainstream architecture in computer vision.Its broad expressiveness and high parallelism give it the ability to match the performance of convolutional neural networks(CNNs).However,there are two main problems in applying the attention mechanism to computer vision at the current stage:high computational complexity and the need for a large amount of training data.To address these issues,a category-query based visual Transformer model(OB_ViT)was proposed.The innovation lies in two aspects:the introduction of learnable category queries and the use of a loss function based on the Hungarian algorithm.Specifically,a learnable category query was used as input to the decoder,which allows reasoning about the relationship between target categories and the global image context.In addition,the Hungarian algorithm was used to enforce unique predictions,ensuring that each category query learns only one target category.Experimental results on the Cifar10 and 5-class Flower image classification datasets showed that the OB_ViT model achieves significantly improved learning accuracy while reducing the number of parameters compared to ViT and ResNet50.For example,on the Cifar10 dataset,there was a 15%reduction in parameters and a 22%improvement in accuracy.

作者姜春雨王伟 JIANG Chunyu;WANG Wei(School of Economics and Management,Jilin Institute of Chemical Technology,Jilin City 132022,China)

机构地区吉林化工学院经济管理学院

出处《吉林化工学院学报》 CAS 2024年第3期62-67,共6页 Journal of Jilin Institute of Chemical Technology

关键词 TRANSFORMER 图像分类类别查询机器学习 Transformer image classification category queries machine learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1吴青云,邹亚囡,史雪莹.基于卷积神经网络的电子鼻分类识别[J].吉林化工学院学报,2022,39(11):38-41. 被引量：2

二级参考文献7

1谭光韬,张文文,王磊.气体传感器阵列混合气体检测算法研究[J].电子测量与仪器学报,2020,32(7):95-102. 被引量：23
2董晓睿.基于梯度提升决策树的气体传感器阵列识别模型研究[J].中国石油大学胜利学院学报,2020,34(3):34-37. 被引量：1
3翁小辉,栾祥宇,陈冬雪,张书军,肖英奎,常志勇.基于特征提取的烃类气体电子鼻检测方法[J].吉林大学学报（工学版）,2020,50(6):2306-2312. 被引量：6
4孟亚男,高思航,张心人境,周雪阳.基于GA-BP神经网络的短期负荷预测[J].吉林化工学院学报,2022,39(3):66-69. 被引量：4
5韩鹏程,燕群,彭涛,宁方立.卷积神经网络在气体泄漏超声识别中的应用[J].应用声学,2022,41(4):602-609. 被引量：5
6李鹏,徐永凯,杨佳康,陆一.基于一维卷积神经网络的气体识别方法研究[J].电子器件,2022,45(3):645-650. 被引量：6
7梁华刚,薄颖,雷毅雄,喻子鑫,刘丽华.结合改进卷积神经网络与通道加权的轻量级表情识别[J].中国图象图形学报,2022,27(12):3491-3502. 被引量：3

共引文献1

1陈娜,孔繁星,王彦旭,何腾飞,李胜男.基于卷积神经网络的车刀磨损研究[J].吉林化工学院学报,2023,40(9):43-47.

1赵立东,胡侨娟,刘彦.高性能的云端推理AI加速器设计[J].集成电路应用,2024,41(7):1-3.
2赵新颖,袁峰,赵臻,王保仓.环上多项式乘法在GPU上的优化实现[J].密码学报（中英文）,2024,11(4):830-844.

吉林化工学院学报

2024年第3期

浏览历史

内容加载中请稍等...

基于类别查询的视觉Transformer研究

参考文献1

二级参考文献7

共引文献1

相关作者

相关机构

相关主题

浏览历史