期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
基于位置编码重叠切块嵌入和多尺度通道交互注意力的鱼类图像分类
1
作者 周雯 谌雨章 +1 位作者 温志远 王诗琦 《计算机应用》 2024年第10期3209-3216,共8页
水下鱼类图像分类是一项极具挑战性的任务。传统Vision Transformer(ViT)网络骨干的局限性较大,难以处理局部连续特征,在图像质量较低的鱼类分类中效果表现不佳。为解决此问题,提出一种基于位置编码的重叠切块嵌入(OPE)和多尺度通道交... 水下鱼类图像分类是一项极具挑战性的任务。传统Vision Transformer(ViT)网络骨干的局限性较大,难以处理局部连续特征,在图像质量较低的鱼类分类中效果表现不佳。为解决此问题,提出一种基于位置编码的重叠切块嵌入(OPE)和多尺度通道交互注意力(MCIA)的Transformer图像分类网络PIFormer(Positional overlapping and Interactive attention transFormer)。PIFormer采用多层级形式构建,每层以不同次数堆叠,利于提取不同深度的特征。首先,引入深度位置编码重叠切块嵌入(POPE)模块对特征图与边缘信息进行重叠切块,以保留鱼体的局部连续特征,并添加位置信息以排序,帮助PIFormer整合细节特征和构建全局映射;其次,提出MCIA模块并行处理局部与全局特征,并建立鱼体不同部位的长距离依赖关系;最后,由分组多层感知机(GMLP)分组处理高层次特征,以提升网络效率,并实现最终的鱼类分类。为验证PIFormer的有效性,提出自建东湖淡水鱼类数据集,并使用公共数据集Fish4Knowledge与NCFM(Nature Conservancy Fisheries Monitoring)以确保实验公平性。实验结果表明,所提网络在各数据集上的Top-1分类准确率分别达到了97.99%、99.71%和90.45%,与同级深度的ViT、Swin Transformer和PVT(Pyramid Vision Transformer)相比,参数量分别减少了72.62×10^(6)、14.34×10^(6)和11.30×10^(6),浮点运算量(FLOPs)分别节省了14.52×10^(9)、2.02×10^(9)和1.48×10^(9)。可见,PIFormer在较少的计算负荷下,具有较强的鱼类图像分类能力,取得了优越的性能。 展开更多
关键词 鱼类图像分类 位置编码 重叠切块嵌入 通道交互注意力 Vision Transformer
下载PDF
Discriminatively learning for representing local image features with quadruplet model
2
作者 张大龙 赵磊 +1 位作者 许端清 鲁东明 《Optoelectronics Letters》 EI 2017年第6期462-465,共4页
Traditional hand-crafted features for representing local image patches are evolving into current data-driven and learning-based image feature, but learning a robust and discriminative descriptor which is capable of co... Traditional hand-crafted features for representing local image patches are evolving into current data-driven and learning-based image feature, but learning a robust and discriminative descriptor which is capable of controlling various patch-level computer vision tasks is still an open problem. In this work, we propose a novel deep convolutional neural network(CNN) to learn local feature descriptors. We utilize the quadruplets with positive and negative training samples, together with a constraint to restrict the intra-class variance, to learn good discriminative CNN representations. Compared with previous works, our model reduces the overlap in feature space between corresponding and non-corresponding patch pairs, and mitigates margin varying problem caused by commonly used triplet loss. We demonstrate that our method achieves better embedding result than some latest works, like PN-Net and TN-TG, on benchmark dataset. 展开更多
关键词 representing patch DESCRIPTOR embedding BENCHMARK utilize CONSTRAINT OVERLAP capable trained
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部