医学图像描述综述:编码、解码及最新进展被引量：4

A survey of medical image captioning technique:encoding,decoding and latest advance

导出

摘要随着医疗成像技术的不断提升,放射科医师每天要撰写的医学报告也与日俱增。深度学习兴起后,基于深度学习的医学图像描述技术用于自动生成医学报告,取得了显著效果。本文全面整理了近年来深度医学图像描述方向的论文,包括这一领域的最新方法、数据集和评价指标,分析了它们各自的优劣,并以模型结构为线索予以介绍,是国内首篇针对医疗图像描述任务的综述。现今的深度医疗图像描述技术主要以编码器—解码器结构为基础进行拓展,包括但不局限于加入检索方法、模板匹配方法、注意力机制、强化学习和知识图谱等方法。检索和模板匹配方法虽然简单,但由于医学报告的特殊性仍在本任务上有不错的效果;注意力机制使模型产生报告时能关注图像和文本的某一部分,已经被几乎所有主流模型所采用;强化学习方法突破了医疗图像描述任务中梯度下降训练法与离散的语言生成评价指标不匹配的瓶颈;知识图谱方法则融合了人类医生对于疾病的先验知识,有效提高了生成报告的临床准确性。此外,Transformer等新型结构也正越来越多地取代循环神经网络(recurrent neural network,RNN)甚至卷积神经网络(convolutional neural network,CNN)的位置成为网络主干。本文最后讨论了目前深度医疗图像描述仍需解决的问题以及未来的研究方向,希望能推动深度医疗图像描述技术真正落地。 Medical image captioning is a labor-intensive daily task for radiologists nowadays.The emerging deep medical image captioning technique has its potential to generate medical captions automatically.There are some challenges to be resolved as mentioned below:1)to organize a feasible and clear structure to readers;2)to strengthen deep medical image caption task itself;3)to optimize the introduced methods.First,the aims and objectives are identified.Then,literature is reviewed for the growth of deep medical image caption till 2021,including their latest methods,datasets and evaluation metrics,and comparative analysis between medical image caption task and generic image caption task.Deep image caption technique is introduced on the basis of prior network structure.Current deep medical image caption technique is mainly developed in terms of the encoder-decoder structure,such as adding retrieval-based methods,template matching based methods,attention mechanisms,reinforcement learning,and knowledge graphs.Specifically,the encoder-decoder structure can be integrated into convolutional neural network(CNN)for image feature extraction and recurrent neural network(RNN)for caption generation,and the two kind of networks are linked by an intermediate vector,called context vector.Such models are based on CNN-RNN-RNN structure,called hierarchical RNN or long short-term memory(LSTM).This structure allows two sort of RNNs to be stacked together,which can generate its thematic vector and captions,and the caption is generated and supervised by the theme vector.The feature of the medical captions can be recognized in relevance to high ratio of repetition and special sentence patterns although the retrieval-based and template-matched methods are still relatively simple.The attention mechanism can be used for a certain part of the image and sentence when the caption is generated and the length of the contextual vector becomes variable.Medical image caption task-oriented reinforcement learning(RL)can be used to alleviate the mismatch problem between the gradient descent training method and the discrete language generation evaluation metric as well.RL can also work as multi-agent to guide the decoder in the form of output before the decoder works,and it can output well-balanced and logical medical contents.Knowledge graph can integrate the prior knowledge of expertise into the model,and diseases having similar features will be in closer nodes in the graph where the disease information can be updated through graph convolution.The integration of medical knowledge graph is focused on improving the clinical accuracy of the generated report effectively.These methods are compatible for each other like template matching based method and attention mechanism based RL can be used simultaneously.In addition,Transformerrelated structures have been developing intensively as the new backbone network beyond RNN and CNN.Transformer or the self-attention block can be trained in parallel,and it can capture the long-distance reliance between tokens,which serves as a better feature extractor.Popular datasets in deep medical image caption are IU X-Ray and MIMIC-CXR,in which frontal and lateral X-Ray images of chest and multiple sentences melted into a single report.Medical annotations like medical subject headings(MeSH)or unified medical language system(UMLS)keywords are beneficial to generate more accurate reports as they can be treated as extra information,and the classification of these tags can be seen as a pretraining task.Generic natural language generation metrics are applied to evaluate the report generated by deep medical image caption models.New metrics like SPICE,SPIDEr and BERTSCORE have been developing beyond existing BLEU-n,ROUGE,METEOR and CIDEr scores.Finally,future research directions are predicted on the four aspects:1)more diverse and more accurate datasets,such as other related modalities like magnetic resonance imaging(MRI)and color Doppler ultrasound.The model can be more robust and adaptive to various tasks in this way because current datasets mostly focus on chest X-Ray photos,which is limited to a single body part and a single modality.2)Evaluation metrics can be more accurate and cost-effective in clinical beyond BLEU or ROUGE scores-related generic natural language generation metrics.The manpower of radiologists can be optimized while existing generic NLG metrics are not the best evaluation in medicine.3)Unsupervised and semi-supervised methods can be used to lower dataset-relevant cost for the medical image captioning task.The cost and training samples can be optimized based on the existing pre-training models like ViLBERT and VL-BERT.4)More prior knowledge can be integrated into the model for the medical image captioning task and multiround conversational medical report generation can be more detailed.

作者朱翌李秀 Zhu Yi;Li Xiu(Shenzhen International Graduate School,Tsinghua University,Shenzhen 518055,China)

机构地区清华大学深圳国际研究生院

出处《中国图象图形学报》 CSCD 北大核心 2023年第7期1990-2010,共21页 Journal of Image and Graphics

基金国家重点研发计划资助(2020AAA0108303) 深圳市科创委资助项目(JCY2020109143041798) 深圳市高校稳定支持计划(WDZC20200820200655001)。

关键词深度学习(DL) 医学图像描述自动医学报告生成编码器—解码器图像字幕 deep learning(DL) medical image captioning automatic radiology report generation encoder-decoder image captioning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1李美佳,于泽宽,刘晓,颜荣耀,于媛媛,王大明,陈涓,陆军,祁鹏,王俊杰,刘杰.点云算法在医学领域的研究进展[J].中国图象图形学报,2020,25(10):2013-2023. 被引量：8
2马露凡,罗凤,严江鹏,徐哲,罗捷,李秀.深度医学图像配准研究进展:迈向无监督学习[J].中国图象图形学报,2021,26(9):2037-2057. 被引量：7
3马龙龙,韩先培,孙乐.图像的文本描述方法研究综述[J].中文信息学报,2018,32(4):1-12. 被引量：6
4魏忠钰,范智昊,王瑞泽,承怡菁,赵王榕,黄萱菁.从视觉到文本:图像描述生成的研究进展综述[J].中文信息学报,2020(7):19-29. 被引量：14
5许昊,张凯,田英杰,种法广,王子超.深度神经网络图像描述综述[J].计算机工程与应用,2021,57(9):9-22. 被引量：11
6杨健程,倪冰冰.医学3D计算机视觉:研究进展和挑战[J].中国图象图形学报,2020,25(10):2002-2012. 被引量：4
7周涛,董雅丽,霍兵强,刘珊,马宗军.U-Net网络医学图像分割应用综述[J].中国图象图形学报,2021,26(9):2058-2077. 被引量：36

二级参考文献27

1江宗康,吕晓钢,张建新,张强,魏小鹏.MRI脑肿瘤图像分割的深度学习方法综述[J].中国图象图形学报,2020,25(2):215-228. 被引量：38
2于宁波,刘嘉男,高丽,孙泽文,韩建达.基于深度学习的膝关节MR图像自动分割方法[J].仪器仪表学报,2020(6):140-149. 被引量：30
3李睿凡,梁昊雨,冯方向,张光卫,王小捷.全卷积神经结构的段落式图像描述算法[J].北京邮电大学学报,2019,42(6):155-161. 被引量：2
4王海南,郝重阳,雷方元,张先勇.非刚性医学图像配准研究综述[J].计算机工程与应用,2005,41(11):180-184. 被引量：24
5王彩芳,姜明.医学图像配准综述[J].CT理论与应用研究（中英文）,2006,15(2):74-80. 被引量：4
6李雄飞,张存利,李鸿鹏,臧雪柏.医学图像配准技术进展[J].计算机科学,2010,37(7):27-33. 被引量：21
7罗述谦,吕维雪.医学图像配准技术[J].国外医学（生物医学工程分册）,1999,22(1):1-8. 被引量：69
8李伟,张玉洁,胡筠,陈淇,汤炜,王杭.激光点云结合逆向工程快速构建软组织立体模型在美容外科的应用[J].中国组织工程研究,2015,19(15):2346-2350. 被引量：9
9马丹,张德强,张文博,李新.基于逆向工程技术的人体牙模三维模型重构[J].机械设计与制造,2017(2):91-93. 被引量：10
10邓嘉,侯晨辉,刁婉,刘玉米.三维点云数据的配准算法综述[J].信息与电脑,2017,29(23):51-52. 被引量：6

共引文献79

1周宇辉,何志琴.基于改进注意力机制的图像描述算法[J].智能计算机与应用,2022,12(2):58-63.
2邓洲,刘茂福,胡慧君,冯文贺.中文图文数据集构建[J].武汉大学学报（理学版）,2020,66(3):253-260.
3王红,白云清,卢林燕.航空安全事件图文关联方法的研究[J].计算机应用与软件,2020,37(12):127-132. 被引量：2
4汪霖,郭佳琛,张璞,万腾,刘成,杜少毅.基于改进ICP算法的三维点云刚体配准方法[J].西北大学学报（自然科学版）,2021,51(2):183-190. 被引量：8
5李柯徵,王海涌.基于改进的多模态神经网络图像描述方法[J].计算机应用与软件,2021,38(9):153-159. 被引量：3
6谢军,肖朝轩,张思刚,刘力卿,律方成,谢庆.基于迁移学习和特征融合的复合绝缘子憎水性等级判别方法[J].电网技术,2021,45(10):3964-3971. 被引量：8
7李梓鸥,费树岷.类GAN算法的脑部核磁共振图像增强技术研究[J].软件导刊,2021,20(11):197-203. 被引量：1
8李文,刘德儿,王有毅,刘鹏,施贵刚.基于超体素的区域聚类的复杂场景分割[J].激光与红外,2021,51(11):1425-1432. 被引量：6
9千月欣,王永忠,李佳骏,徐天羿.基于深度学习的机场能见度预测研究[J].云南民族大学学报（自然科学版）,2021,30(6):615-620. 被引量：4
10霍占强,王勇杰,雒芬,乔应旭.基于超点图网络的三维点云室内场景分割模型[J].计算机工程,2021,47(12):308-315. 被引量：5

同被引文献13

1高楠,彭鼎原,傅俊英,赵蕴华.基于专利IPC分类与文本信息的前沿技术演进分析——以人工智能领域为例[J].情报理论与实践,2020,43(4):123-129. 被引量：40
2高鸿斌,毛金莹,王会勇.K-VQA:一种知识图谱辅助下的视觉问答方法[J].河北科技大学学报,2020,41(4):315-326. 被引量：3
3周涛,董雅丽,霍兵强,刘珊,马宗军.U-Net网络医学图像分割应用综述[J].中国图象图形学报,2021,26(9):2058-2077. 被引量：36
4唐朝生,胡超超,孙君顶,司马海峰.医学图像深度学习技术:从卷积到图卷积的发展[J].中国图象图形学报,2021,26(9):2078-2093. 被引量：9
5Lai-Jun Xu,Jian-Ying Zhang,Zi-Hua Huang,Xiang-Zhu Wang.Successful individualized endodontic treatment of severely curved root canals in a mandibular second molar:A case report[J].World Journal of Clinical Cases,2022,10(14):4632-4639. 被引量：1
6蒋希,袁奕萱,王雅萍,肖振祥,朱美芦,陈泽华,刘天明,沈定刚.中国医学影像人工智能20年回顾和展望[J].中国图象图形学报,2022,27(3):655-671. 被引量：12
7叶慧慧,何宏建,方静宛,童琪琦,周子涵,刘华锋.大脑多模态成像技术定量研究进展[J].中国图象图形学报,2022,27(6):1944-1955. 被引量：2
8罗娜,宋明,杨正宜,蒋田仔.跨模态脑图谱数据融合研究进展[J].中国图象图形学报,2022,27(6):2036-2056. 被引量：1
9叶仕俊,张鹏程,吉顺慧,戴启印,袁天昊,任彬.人工智能软件系统的非功能属性及其质量保障方法综述[J].软件学报,2023,34(1):103-129. 被引量：5
10秦志金,赵菼菼,李凡,陶晓明.多模态语义通信研究综述[J].通信学报,2023,44(5):28-41. 被引量：3

引证文献4

1秦俊,卢婷岚,纪柏,李雨晴.面向低剂量CT的牙齿分割网络[J].中国图象图形学报,2024,29(3):686-696.
2沈熠婷,陈昭,张清华,陈锦豪,王庆国.全监督和弱监督图网络的病理图像分割[J].中国图象图形学报,2024,29(3):697-712.
3李雯,樊令仲,宋明,张瑜,罗娜,程禄祺,蒋田仔.儿童青少年大脑发育及脑图谱研究综述[J].中国图象图形学报,2024,29(6):1555-1574.
4原蕾,王科俊.利用人工智能神经网络体系结构生成视觉问答系统中的自然语言解释[J].西南大学学报（自然科学版）,2024,46(10):212-221.

1杨丽娟,无.人工智能是怎样“炼成的”?[J].世界博览,2023(5):20-27.
2张立波.两种临床检验方法在尿液潜血检验中的临床准确性比照观察[J].中文科技期刊数据库（文摘版）医药卫生,2023(8):15-17.
3任伟,张彤,胡靖宇.基于知识图谱的科技论文多特征参数实证研究——以《煤炭与化工》期刊数据为例[J].科技传播,2023,15(13):35-38. 被引量：1
4林戴忠.生成式AI应用于政府统计服务的理论初探[J].统计科学与实践,2023(7):53-56.
5郭福乾.小组合作学习在初中数学教学中的应用[J].数学学习与研究,2023(5):53-55.
6黄严枢.Siemens-1.5TAmira磁共振成像系统相控阵线圈的使用体会[J].中文科技期刊数据库（引文版）医药卫生,2023(5):24-27.
7姚留兵.高中物理合作学习课堂教学策略的实践研究[J].高考,2023(1):100-103.
8刘兴国.医学影像技术在医学影像诊断中的临床应用探研[J].中国科技期刊数据库医药,2023(8):85-87.
9曾嘉涛,张贺晔,刘华锋.基于深度学习的心脏图像分割研究现状[J].中国图象图形学报,2023,28(6):1811-1828. 被引量：1
10钱志,魏国华,陈雨茂,孔庆丰,侯庆杰.井—震—动态多级联合复杂断裂系统描述技术[J].地质论评,2023,69(S01):381-382. 被引量：1

中国图象图形学报

2023年第7期

浏览历史

内容加载中请稍等...

医学图像描述综述:编码、解码及最新进展被引量：4

参考文献7

二级参考文献27

共引文献79

同被引文献13

引证文献4

相关作者

相关机构

相关主题

浏览历史

医学图像描述综述:编码、解码及最新进展 被引量：4

参考文献7

二级参考文献27

共引文献79

同被引文献13

引证文献4

相关作者

相关机构

相关主题

浏览历史

医学图像描述综述:编码、解码及最新进展被引量：4