嵌入卷积增强型Transformer的头影解剖关键点检测

Cephalometric landmark keypoints localization based on convolution-enhanced Transformer

导出

摘要目的准确可靠的头像分析在正畸诊断、术前规划以及治疗评估中起着重要作用,其常依赖于解剖关键点间的相互关联。然而,人工注释往往受限于速率与准确性,并且不同位置的结构可能共享相似的图像信息,这使得基于卷积神经网络的方法难有较高的精度。Transformer在长期依赖性建模方面具有优势,这对确认关键点的位置信息有所帮助,因此开发一种结合Transformer的头影关键点自动检测算法具有重要意义。方法本文提出一种基于卷积增强型Transformer的U型架构用于侧位头影关键点定位,并将其命名为CETransNet(convolutional enhanced Transformer network)。通过改进Transformer模块并将其引入至U型结构中,在建立全局上下文连接的同时也保留了卷积神经网络获取局部信息的能力。此外,为更好地回归预测热图,提出了一种指数加权损失函数,使得监督学习过程中关键点附近像素的损失值能得到更多关注,并抑制远处像素的损失。结果在2个测试集上,CETransNet分别实现了1.09 mm和1.39 mm的定位误差值,并且2 mm内精度达到了87.19%和76.08%。此外,测试集1中共有9个标志点达到了100%的4 mm检测精度,同时多达12个点获得了90%以上的2 mm检测精度;测试集2中,尽管只有9个点满足90%的2 mm检测精度,但4 mm范围内有10个点被完全检测。结论CETransNet能够快速、准确且具备鲁棒性地检测出解剖点的位置,性能优于目前先进方法,并展示出一定的临床应用价值。 Objective Accurate and reliable cephalometric image measurement and analysis,which usually depend on the correlation among anatomical landmark points,play essential roles in orthodontic diagnosis,preoperative planning,and treatment evaluation.However,manual annotation hinders the speed and accuracy of measurement to a certain extent.Therefore,an automatic cephalometric landmark detection algorithm for daily diagnosis needs to be developed.However,the size of anatomical landmarks accounts for a small proportion of an image,and the structures at different positions may share similar radians,shapes,and surrounding soft tissue information that are difficult to distinguish.The current methods based on convolutional neural networks(CNNs)extract depth features by applying down-sampling to facilitate the building of a global connection,but these methods may suffer from spatial information loss and inefficient context modeling,hence preventing them from meeting accuracy requirements in clinical applications.Transformer has advantages in long-term dependency modeling but is not good at capturing local features,hence explaining the insufficient accuracy of models based on pure Transformer for key point localization.Therefore,an end-to-end model with global context modeling and better local spatial feature representation must be built to solve these problems.Method To detect the anatomical landmarks effi⁃ciently and effectively,a U-shaped architecture based on convolution-enhanced Transformer called CETransNet is pro⁃posed in this manuscript to locate the key points of lateral cephalometric images.The overwhelming success of UNet lies in its ability to analyze the local fine-grained nature of an image at the deep level,but this method suffers from global spatial information loss.By improving and introducing the Transformer module into the U-shaped structure,the ability of convolu⁃tional networks to obtain local information is retained while establishing global context connection.In addition,to effi⁃ciently regress and predict the heatmaps,an exponential weighted loss function is proposed so that the loss value near the landmark pixels can receive more attention in the supervised learning process and the loss of distant pixels can be sup⁃pressed.Each image is rescaled to 768×768 pixels and maintains a fixed aspect ratio corresponding to its original ratio via a zero padding operation,and data augmentation is performed via random rotation,Gaussian noise addition,and elastic transformation.During the training phase,experiments are conducted on a server using Tesla V100 SXM3-32 GB GPUs.The model is optimized by an Adam optimizer with a batch size of 2,and the initial learning rate is set to 0.0001 and decreased by 0.75 times every 5 epochs.Result To demonstrate its strengths,CETransNet is compared with the most advanced methods,and ablation studies are performed to confirm the contribution of each component.Experiments were performed on a public X-ray cephalometric dataset.Quantitative results show that CETransNet obtains mean radial error(MRE)values of 1.09 mm and 1.43 mm in the two test datasets,respectively,and the accuracies within a clinically accepted 2 mm error are 87.16%and 76.08%.A total of 9 key points in Test1 achieve a 100%successful detection rate(SDR)value,and in the clinically allowable 2.0 mm region,the detection accuracy reaches 90%with up to 12 land⁃marks.In Test2,although only 9 points satisfy the SDR accuracy of 90%,10 points within 4 mm are completely detected.Compared with the best competing method,CETransNet improves the MRE by 2.7%and 2.1%on the two datasets,respectively.CETransNet also outperforms other popular vision Transformer methods on the benchmark Test1 dataset and achieves a 2.16%SDR improvement within 2 mm compared with the sub-optimal model.Meanwhile,the analysis of the influence of the backbone network on the model performance reveals that ResNet-101 reaches the minimal MRE,while ResNet-152 obtains the best SDR within 2 mm.Results of ablation studies show that the convolution-enhanced Transformer can decrease MRE by 0.3 mm and improve SDR in 2.0 mm by 7.36%.Meanwhile,the proposed EWSmoothL1 further reduces the MRE to 1.09 mm.Benefitting from these components,CETransNet can detect the position of anatomical land⁃marks quickly,accurately,and robustly.Conclusion This paper proposes a cephalometric landmark detection framework with a U-shaped architecture that embeds the convolution-enhanced Transformer in each residual layer.By fusing the advantages of both Transformer and CNNs,the proposed framework effectively captures the long-term dependence and local natures and thus obtains the special position and structure information of key points.To address the ambiguity caused by other similar structures in an image,an exponential weighted loss function is proposed in order for the model to focus on the loss of the target area than the other parts.Experimental results show that CETransNet achieves the best MRE and SDR per⁃formance compared with advanced methods,especially in the clinically allowable 2.0 mm region.A series of ablation experiments also prove the effectiveness of the proposed modules,thereby confirming that CETransNet shows a competent performance in anatomical landmark detection and possesses great potential to solve the problems in cephalometric analysis and treatment planning.In future work,other lightweight models with better robustness will be designed.

作者杨恒顾晨亮胡厚民张劲李康何凌 Yang Heng;Gu Chenliang;Hu Houmin;Zhang Jing;Li Kang;He Ling(School of Electrical Engineering,Sichuan University,Chengdu 610065,China;China Southwest Electronic Technology Research Institute,Chengdu 610036,China;School of Biomedical Engineering,Sichuan University,Chengdu 610065,China)

机构地区四川大学电气工程学院中国西南电子技术研究所四川大学生物医学工程学院

出处《中国图象图形学报》 CSCD 北大核心 2023年第11期3590-3601,共12页 Journal of Image and Graphics

基金国家重点研发计划资助(2020YFB1711500) 四川大学华西医院优秀学科1·3·5项目(ZYYC21004)。

关键词头影测量关键点检测视觉Transformer 注意力机制热图回归卷积神经网络(CNN) cephalometric measurement landmark keypoints localization vision Transformer attention mechanism heatmap regression convolutional neural network(CNN)

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1任家豪,张光华,乔钢柱,武秀萍.多尺度特征融合的头影标志点检测[J].计算机工程,2023,49(3):271-279. 被引量：1

二级参考文献3

1杨涵方,周向东.基于深度稀疏辨别的跨领域图像分类[J].计算机工程,2018,44(4):310-316. 被引量：5
2齐国强,姚剑敏,胡海龙,严群,林志贤.引入多尺度特征图融合的人脸关键点检测网络[J].计算机应用研究,2020,37(12):3825-3829. 被引量：6
3李赛,黎浩江,刘立志,张天桥,陈洪波.基于尺度注意力沙漏网络的头部MRI解剖点自动定位[J].光学精密工程,2021,29(9):2278-2286. 被引量：2

1张天明,吴小艳.MRC肌功能矫治器对替牙期安氏Ⅱ类Ⅰ分类错颌畸形患儿矫治效果研究[J].现代诊断与治疗,2023,34(13):1914-1916. 被引量：1
2刘树东,任慧娟,张众维.基于特征融合和注意力机制的遥感目标检测[J].遥感信息,2023,38(5):1-7.
3张栋杰,叶展怡,黎凡,龙宝军.种植支抗辅助口内Ⅲ类牵引对单侧唇腭裂患儿骨性Ⅲ类错合畸形的效果[J].中华医学美学美容杂志,2023,29(6):482-487.
4石榕.基于注意力机制的GRU模型的豆粕期货价格预测[J].洛阳理工学院学报（自然科学版）,2023,33(4):87-91.
5杜申钊,徐珊,郭爽.隐形矫正和固定矫正治疗前牙内收的效果比较[J].中国实用医刊,2023,50(18):53-56.
6龚蓓文,常荍,左飞飞,谢贤聚,王少烽,王亚杰,孙雅溪,管修晨,白玉兴.基于卷积神经网络的头影测量自动定点研究[J].中华口腔医学杂志,2023,58(12):1249-1256.
7胡洁琼,李青奕.三维立体摄影测量技术与正畸[J].口腔医学,2023,43(11):1047-1052.
8魏彬,黄海涛,王涛,彭思远.交通运输行业网络安全评价指数体系模型[J].综合运输,2023,45(11):43-48.
9吴昊昊,倪晋.基于NP和NPL模型在黄河流域的月径流模拟研究[J].水利规划与设计,2023(10):41-49. 被引量：2
10伍松,叶怀光.MBT直丝弓矫治器联合微型种植体支抗矫治对安氏Ⅱ类1分类错畸形下颌骨硬组织、软组织及牙周健康的影响[J].成都医学院学报,2023,18(6):727-731. 被引量：2

中国图象图形学报

2023年第11期

浏览历史

内容加载中请稍等...

嵌入卷积增强型Transformer的头影解剖关键点检测

参考文献1

二级参考文献3

相关作者

相关机构

相关主题

浏览历史