摘要
近年来,由于Transformer能够有效地捕获全局上下文信息,在机器视觉领域展现了巨大的应用潜力。然而,它只能获取单尺度的上下文信息,对于细节信息的提取仍然存在一定的局限性。针对此类问题,提出了RT-Unet算法。该算法在RESwin Transformer模块中引入内在的局部归纳偏置,并在位置嵌入与编码模块使用4个连续的卷积层。将卷积与Transformer有效地组合在一起,既可以获得丰富的多尺度特征,又可以关注局部细节信息与远程依赖关系。另外,改用GELU激活函数增加算法的非线性因素,避免训练时出现梯度消失问题。实验采用Synapse腹部多器官分割数据集,结果表明RT-Unet的性能优于ViT,V-Net,U-Net,Swin-Unet和TU-Net等算法,并取得了DSC为79.08%,HD为23.43 mm的分割结果。
In recent years,Transformer can effectively capture global context information,showing great application potential in the field of machine vision.However,it can only obtain single scale context information,and there are still certain limitations for the extraction of detailed information.To solve this problem,the RT-Unet algorithm is proposed.The algorithm introduces the intrinsic local inductive bias in RESwin Transformer block,and four successive convolution layers are used to construct position embedding and encoding block.Combining convolution and Transformer effectively can not only obtain rich multi-scale features,but also focus on local details and long-range dependencies.In addition,GELU activation function is used to increase the nonlinear factor of the algorithm to avoid the problem of gradient disappearance during training.The experiment uses Synapse abdominal multi-organ segmentation dataset.The results show that the performance of RT-Unet is better than ViT,V-Net,U-Net,Swin-Unet,TU-Net and other algorithms,achieving segmentation results of 79.08%DSC and 23.43 mm HD.
作者
赵佳美
吴迪康
王志芳
ZHAO Jiamei;WU Dikang;WANG Zhifang(School of Electronic Engineering,Heilongjiang University,Harbin 150080,China)
出处
《无线电工程》
北大核心
2023年第2期381-386,共6页
Radio Engineering
基金
黑龙江省自然科学基金(F2018025)。