摘要
针对手部边缘细节信息分割不精确及小面积手部的错检、漏检问题,提出一种基于注意力机制的多尺度手部分割方法。首先,对Transformer模块重新进行设计优化,提出窗口自注意力结构和双分支前馈神经网络(Dual-branch FeedForward Networks,D-FFN)机制,通过窗口自注意力机制整合全局和局部的依赖信息,D-FFN抑制背景信息的干扰;然后,提出一种结合条形池化和级联网络的多尺度特征提取模块增大感受野,提高手部分割模型的准确性和鲁棒性;最后,提出基于Triplet Attention机制的上采样解码器模块,通过调节通道维度与空间维度的注意力权重将目标特征和背景的冗余特征区分开。将所提算法在公开数据集GTEA(Georgia Tech Egocentric Activity)和EYTH(EgoYouTubeHands)上测试,实验结果表明,该算法在两个数据集上的平均交并比(MIoU)值分别达到了95.8%和90.2%,相较于TransUnet算法分别提升了2.5%和2.1%,满足手部图像分割的稳定可靠、精度高、抗干扰能力强等要求。
Aiming at the problem of inaccurate segmentation of hand edge detail information and missed detection of small-area hand,a multi-scale hand segmentation method based on attention mechanism is proposed.Firstly,the Transformer module is redesigned and optimized,and the window self-attention structure and D-FFN mechanism are proposed.The window self-attention mechanism integrates global and local dependent information,and D-FFN suppresses the interference of background information.Then,a multi-scale feature extraction module combining strip pooling and cascade network is proposed to increase the receptive field and improve the accuracy and robustness of the hand segmentation model.Finally,an up-sampling decoder module based on Triplet Attention mechanism is proposed.By adjusting the attention weight of channel dimension and spatial dimension,the redundant features of target features and background are distinguished.The proposed algorithm is tested on public datasets GTEA(Georgia Tech Egocentric Activity)and EYTH(EgoYouTubeHands).Experimental results show that average MIoU values of the algorithm on the two datasets reach 95.8%and 90.2%,respectively,which is 2.5%and 2.1%higher than the TransUnet algorithm.It meets the requirements of stable and reliable,high precision and strong anti-interference ability of hand image segmentation.
作者
周雯晴
代素敏
王阳萍
王文润
ZHOU Wenqing;DAI Sumin;WANG Yangpin;WANG Wenrun(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China;Gansu Artificial Intelligence and Graphics and Image Processing Engineering Research Center,Lanzhou 730070,China;Beijing Zhongdian Feihua Communication Co.Ltd.,Beijing 100700,China)
出处
《液晶与显示》
CAS
CSCD
北大核心
2024年第11期1506-1518,共13页
Chinese Journal of Liquid Crystals and Displays
基金
国家自然科学基金(No.62067006,No.62367005)
甘肃省知识产权计划(No.21ZSCQ013)
兰州市青年科技人才创新项目(No.2023-QN-117)
兰州交通大学青年科学基金(No.2022012)
高校科研创新平台重大培育项目(No.2024CXPT-17)。