摘要
目的 现有Transformer模型虽然在形态复杂的结直肠息肉分割中拥有较高准确率,但是其注意力分散,编码器输出多级语义信息在融合中会产生信息丢失,限制了模型准确率进一步提高,针对此问题,提出一种新的肠道息肉图像分割模型:双通管道聚合网络(Dual-Channel Aggregation Transformer, R-DCAformer)。方法R-DCAformer模型使用金字塔混合的Transformer(Mix Transformer, MIT)和Resnet18充当编码器,设计了双通道聚合(DualChannel Aggregation, DCA)模块充当解码器。DCA解码器由注意力聚合模块(Attention Aggregation, AA)和双通道特征聚合模块(Dual-Channel Feature Fusion,DFF)组成,其中,金字塔MIT编码器可以为模型提供充足泛化能力,AA模块可以通过融合Resnet18的额外特征限制模型MIT中的注意力分散,DFF模块则可以缓解多级语义信息融合中的信息丢失问题。结果 泛化能力实验中,R-DCAformer在CVC-ColonDB中相比于基线模型中最优的m Dice、m IoU和MAE分别提高了2.10%、1.65%和22.5%,在ETIS中,相比于基线模型中最优的m Dice、m IoU和MAE分别提高了2.56%、2.12%和15%;模型在CVC-ClinicDB数据集上,相比于基线模型中的最优m Dice、m IoU提高了约0.85%、1.35%;在Kvasir-SEG数据集上,相比于基线模型中的最优m Dice、m IoU和MAE提高了约1.19%、1.97%和17.39%。此外还通过消融实验和注意力图论证了本文所提出模块的有效性。结论R-DCAformer在学习和泛化实验中效果都较为优异,总体上优于对比的基线模型,为结直肠息肉分割提供了新的高性能模型。
Objective Although the existing Transformer model has high accuracy in segmenting colorectal polyps with complex morphology,the distraction of the Transformer model and the loss of information in the fusion of its encoder outputting multilevel semantic information limit the further improvement of the model’s accuracy.Based on this,a novel image segmentation model(the Dual-Channel Aggregation Transformer,R-DCAformer)for intestinal polyps was proposed.Methods The R-DCAformer model used a pyramid mix Transformer(MIT)and Resnet18 to act as an encoder and a dual-channel aggregation(DCA)module was designed to act as a decoder.The DCA decoder consisted of an attention aggregation(AA)module and a dual-channel feature fusion(DFF)module.In this model,the pyramid MIT encoder provided sufficient generalization ability for the model,the AA module limited the distraction in the model MIT by fusing the additional features of Resnet18,and the DFF module alleviated the problem of information loss in the fusion of multi-level semantic information.Results In the generalization ability experiment,R-DCAformer improved the optimal mDice,mIoU,and MAE by 2.10%,1.65%,and 22.5%,respectively,in CVC-ColonDB compared with the optimal ones in the baseline model.The optimal mDice,mIoU,and MAE in ETIS were improved by 2.56%,2.12%,and 15%,respectively,compared with the optimal ones in the baseline model.The model improved the optimal mDice and mIoU by about 0.85%and 1.35%in the CVC-ClinicDB dataset compared with the optimal ones in the baseline model,and the optimal mDice,mIoU,and MAE on the Kvasir-SEG dataset were improved by about 1.19%,1.97%,and 17.39%,respectively,compared with those in the baseline model.The effectiveness of the module proposed in this paper was also demonstrated by ablation experiments and attention graphs.Conclusion R-DCAformer is more effective in both learning and generalization experiments,and generally outperforms the compared baseline models,providing a new high-performance model for colorectal polyp segmentation.
作者
高艾国
郑晓亮
GAO Aiguo;ZHENG Xiaoliang(School of Electrical and Information Engineering,Anhui University of Science&Technology,Anhui Huainan 232001,China)
出处
《重庆工商大学学报(自然科学版)》
2024年第5期49-57,共9页
Journal of Chongqing Technology and Business University:Natural Science Edition
基金
煤炭安全精准开采国家地方联合工程研究中心开放基金资助(EC2021006)
安徽理工大学高层次引进人才科研启动基金资助(2021YJRC02)。
关键词
息肉图像分割
深度学习
双通道聚合
注意力聚合
泛化能力
polyp image segmentation
deep learning
dual-channel aggregation
attention aggregation
generalization ability