摘要
目的手术器械分割是外科手术机器人精准操作的关键环节之一,然而,受复杂因素的影响,精准的手术器械分割目前仍然面临着一定的挑战,如低对比度手术器械、复杂的手术环境、镜面反射以及手术器械的尺度和形状变化等,造成分割结果存在模糊边界和细节错分的问题,影响手术器械分割的精度。针对以上挑战,提出了一种新的手术器械分割网络,实现内窥镜图像中手术器械的准确分割。方法为了实现内窥镜图像的准确表征以获取有效的特征图,提出了基于卷积神经网络(convolutional neural network,CNN)和Transformer融合的双编码器结构,实现分割网络对细节特征和全局上下文语义信息的提取。为了实现局部特征图的特征增强,引入空洞卷积,设计了多尺度注意融合模块,以获取多尺度注意力特征图。针对手术器械分割面临的类不均衡问题,引入全局注意力模块,提高分割网络对手术器械区域的关注度,并减少对于无关特征的关注。结果为了有效验证本文模型的性能,使用两个公共手术器械分割数据集进行性能分析和测试。基于定性分析和定量分析通过消融实验和对比实验,验证了本文算法的有效性和优越性。实验结果表明:在Kvasir-instrument数据集上,本文算法的Dice分数和mIOU(mean intersection over union)值分别为96.46%和94.12%;在Endovis2017(2017 Endoscopic Vision Challenge)数据集上,本文算法的Dice分数和mIOU值分别为96.27%和92.55%。相较于对比的先进分割网络,本文算法实现了分割精度的有效提升。同时,消融研究也证明了本文算法方案设计的合理性,缺失任何一个子模块都会造成不同程度的精度损失。结论本文所提出的分割模型有效地融合了CNN和Transformer的优点,同时实现了细节特征和全局上下文信息的充分提取,可以实现手术器械准确、稳定分割。
Objective Medical instruments are recognized as indispensable tools to deal with surgerical tasks.Surgical trauma is still challenged to be optimized farther.The emerging surgical robots could shrink the harmful degree of derived of tsurgery operations,and it has higher stability and stronger learning ability in comparison with manual-based surgery.The precise segmentation of surgical instruments is a key link to the smooth operation of surgical robots.The existing seg⁃mentation methods can be used to locate the surgical instruments and segment the shape of the surgical instruments roughly.Due to these complex factors are required to be resolved in relevance to low contrast of surgical instruments,com⁃plex environment,mirror reflection,different sizes and shapes of surgical instruments,these segmentation methods are still challenged for a certain loss on boundary information and detailed features of surgical instruments,resulting in blurred boundaries and misclassification of details.To optimize its related surgical instrument segmentation,we develop a Trans⁃former and convolutional neural network(CNN)based dual-encoder fusion segmentation network in terms of endoscopic images-relevant surgical instrument segmentation.Method For the encoder-decoder framework,a dual-encoder fusion seg⁃mentation network is facilitated to construct an end-to-end surgical instrument segmentation scheme.To optimize weak fea⁃ture representation ability and get effective context features further,a Transformer and CNN fused dual-encoder block is built up to strengthen endoscopic images-related extraction ability of local details and global context information simultane⁃ously.In addition,effective multi-scale feature extraction is also essential for the improvement of segmentation accuracy since heterogeneous surgical instruments are existed in sizes and shapes.To extract multi-scale attention feature maps,a multi-scale attention fusion module is embedded into the bottleneck layer for feature enhancement of local feature maps.To resolve its class imbalance issue-related surgical instrument segmentation task,an attention gated block is also introduced into the decoder unit to integrate the segmentation network into the surgical instruments better,and the attention to irrel⁃evant features can be reduced as well.Result To verify the effectiveness and potentials of the dual-encoder fusion segmenta⁃tion network proposed,two sort of publicity datasets on surgical instrument segmentation are adopted,including cataract surgery dataset(Kvasir-instrument dataset)and gastrointestinal surgery dataset(Endovis2017 dataset).Combined with the qualitative analysis and quantitative analysis,the segmentation performance is tested based on three sorts of experi⁃ments in related to ablation,comparison and visualization.The proposed dual-encoder fusion segmentation network has obtained a good segmentation results on both two datasets,which could achieve 96.46%of Dice score and 94.12%of mean intersection over union(mIOU)value on the Kvasir-instrument dataset,and 96.27%of Dice score and 92.55%of mIOU value on the Endovis2017 dataset.Compared to other related state-of-the-art comparison methods,the Dice score is improved by 1.51%and the mIOU value is improved by 2.52%compared to progressive alternating attention network(PAANet)model on the Kvasir-instrument dataset,and the Dice score is improved by 1.62%and the mIOU is improved by 2.22%compared to refined attention segmentation network(RASNet)model on the Endovis2017 dataset.Furthermore,to verify the effectiveness of each sub-module,quantitative and qualitative analysis based ablation experiments are also car⁃ried out.The dual-encoder module can be verified to improve its segmentation accuracy for Kvasir-instrument dataset and Endovis2017 dataset as well.Conclusion To optimize surgical instrument segmentation task against such problems like mir⁃ror reflection,different shapes and size,and class imbalance,a CNN and Transformer based dual-encoder fusion segmen⁃tation network is developed to build up an end-to-end surgical instrument segmentation scheme.It is predicted that our method proposed can be used to segment the surgical instruments accurately based on endoscopic images in various shapes and sizes,which can provide a potential ability for robot-assisted surgery further.
作者
杨磊
谷玉格
边桂彬
刘艳红
Yang Lei;Gu Yuge;Bian Guibin;Liu Yanhong(School of Electrical and Information Engineering,Zhengzhou University,Zhengzhou 450001,China;Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
出处
《中国图象图形学报》
CSCD
北大核心
2023年第10期3214-3230,共17页
Journal of Image and Graphics
基金
国家重点研发计划资助(2020YFB1313701)
国家自然科学基金项目(62003309)。