摘要
【目的】利用知识谱图导入外部知识,结合多模态融合机制和置信度检测机制,探索临床问题和医学图像之间的相互联系,提升其在医疗视觉问答任务的效果。【方法】提出一种新的医疗视觉问答的模型,该模型由文本知识增强层、图像嵌入层、多模态融合层、置信度检测层和预测层组成。文本知识增强层将外部知识图谱嵌入到临床问题表示中,图像嵌入层获取医疗图像表示,多模态融合层捕捉文本与图像的交互关系,置信度检测层评估数据的可信度,预测层生成预测结果。最终将所提出的模型在VQA-RAD和PathVQA数据集上开展实证研究。【结果】实验结果表明,基于知识增强与多模态融合的医疗视觉问答模型在VQA-RAD和PathVQA数据集上的开放域问答的最优准确率达到了59.3%和16.2%,证明了该模型的有效性。【局限】仅仅考虑了单一语言情境,需要在其他多语言数据集上进一步验证所提模型的有效性。【结论】本研究显著提高了医疗视觉问答任务的性能,对于提高医疗健康领域的服务质量和效率,以及在某些专业领域的样本扩充工作方面具有重要的参考价值。
[Objective]This paper uses a knowledge graph to introduce external knowledge,combined with multimodal fusion and confidence detection mechanisms,to explore the relationship between clinical questions and medical images.It enhances the performance in medical visual question answering(VQA)tasks.[Methods]We proposed a novel medical VQA model consisting of a text knowledge enhancement layer,an image embedding layer,a multimodal fusion layer,a confidence detection layer,and a prediction layer.The text knowledge enhancement layer embeds external knowledge graphs into the clinical question representation,the image embedding layer captures the medical image representations,the multimodal fusion layer captures the interaction between text and image,the confidence detection layer assesses the reliability of the data,and the prediction layer generates the prediction results.We conducted empirical studies on the VQA-RAD and PathVQA datasets.[Results]The optimal accuracy of the proposed model reached 59.3%and 16.2%,demonstrating the model’s effectiveness.[Limitations]We only consider a single language context and need more validation on other multilingual datasets.[Conclusions]This study significantly improves the performance of medical VQA tasks.It provides important reference values for enhancing the quality and efficiency of services in the healthcare field and other professional domains.
作者
张殿元
余传明
Zhang Dianyuan;Yu Chuanming(School of Information Engineering,Zhongnan University of Economics and Law,Wuhan 430073,China)
出处
《数据分析与知识发现》
EI
CSSCI
CSCD
北大核心
2024年第8期226-239,共14页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金项目(项目编号:72374219,71974202)
教育部人文社会科学基金项目(项目编号:19YJC870029)的研究成果之一
关键词
知识图谱
医疗视觉问答
知识增强
多模态融合
Knowledge Graph
Medical Visual Question-Answering
Knowledge Enhancement
Multi-Modal Fusion