期刊文献+

基于文本引导下的多模态医学图像分析算法

A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance
下载PDF
导出
摘要 结合胃镜超声和白光内镜可以更准确地识别胃肠道间质瘤.但是现有的多模态方法往往仅关注于图像特征,忽略了诊断文本信息中所包含的语义信息对于精确理解和诊断医学图像的重要性.为此,本文提出一种新的基于文本引导下的多模态医学图像分析算法框架(Text-guided Multi-modal Medical image analysis framework,TMM-Net).TMM-Net使用多阶段的诊断文本来引导模型学习,以提取图像中的关键诊断信息特征,然后通过交叉模态注意力机制促进多模态特征之间的交互.值得注意的是,TMM-Net通过预测病变属性来模拟临床诊断过程,从而增强了可解释性.验证实验在两个中心包含10 025个模态数据对的数据集上进行.结果表明,该方法相比目前最优的GISTs诊断方法精度提升7.7%,同时获得了最高的(Area Under the Curve,AUC)值:0.927,其可解释性可以更好地适合临床需求. Combining gastroscopy ultrasound and white light endoscopy can improve the accuracy of identifying gas⁃trointestinal stromal tumors(GISTs).However,existing multi-modal methods often focus solely on image features and over⁃look the semantic relevance contained in diagnostic textual information,which is crucial for precise understanding and diag⁃nosis of medical images.To address this issue,we propose a novel text-guided multi-modal medical image analysis frame⁃work(TMM-Net).TMM-Net extracts key diagnostic information features from images through a multi-stage guided model of diagnostic text,and then promotes the interaction of multi-modal features through cross-modal attention mechanisms.Nota⁃bly,TMM-Net simulates the clinical diagnostic process by predicting lesion attributes,enhancing interpretability.Validation experiments were conducted on a dataset consisting of 10025 modality data pairs from two centers.The results show that the proposed method achieves a 7.7%improvement in accuracy compared to the current state-of-the-art GISTs diagnostic meth⁃od,with the highest AUC(Area Under the Curve)value of 0.927,and its interpretability may better suit clinical needs.
作者 樊琳 龚勋 郑岑洋 FAN Lin;GONG Xun;ZHENG Cen-yang(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu,Sichuan 611756,China;Engineering Research Center of Sustainable Urban Intelligent Transportation,Ministry of Education,Chengdu,Sichuan 611756,China;National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Chengdu,Sichuan 611756,China;Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province,Chengdu,Sichuan 611756,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2024年第7期2341-2355,共15页 Acta Electronica Sinica
基金 国家自然科学基金(No.62376231) 四川省重点研发项目(No.2023YFG0267) 四川省卫生健康委员会科技项目(No.23LCYJ022)~~。
关键词 多模态融合 模型可解释性 图像-文本匹配 胃肠道间质瘤 胃镜超声 白光内镜 multi-modal fusion model interpretability image-text matching gastrointestinal stromal tumor gastro⁃scopic ultrasound white light endoscopy
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部