期刊文献+

用于肺水肿量化的掩码图像-语言蒸馏模型

Masked Image-language Distillation Model for Pulmonary Edema Assessment
原文传递
导出
摘要 肺水肿量化是治疗急性充血性心力衰竭(congestive heart failure,CHF)的关键。用于视觉和语言预训练的多模态掩码自编码器已被证实可有效融合胸片和肺水肿放射学报告的多模态信息以提升肺水肿量化精度。但现有的方法是随机地对图像和文本进行掩码操作,这一不稳定的操作容易导致模型忽略图像病灶和文本关键词,并阻碍多模态信息的融合与对齐,最终影响量化精度。针对上述问题,本研究设计了一种掩码图像-语言蒸馏模型,首次将自蒸馏引入到医学图像-语言预训练任务中,使得模型获得更为稳定可靠的医学图像和语言表示;并对跨模态注意力融合机制进行优化,使得模型更好地融合与对齐多模态信息。相比于101层残差神经网络(residual network 101,ResNet101)、视觉Transformer(vision transformer,ViT)-B/16、联合胸片和肺水肿放射学报告建模(joint modeling of chest radiographs and radiology reports for pulmonary edema assessment,JMC3R)和用于视觉和语言预训练的多模态掩码自编码器(multi-modal masked autoencoders for medical vision and language pre-training,M3AE),本研究所提出的方法在肺水肿量化数据集(pulmonary edemaassessmentdataset,PEAD)上获得了更高的肺水肿量化精度。 Pulmonary edema assessment is critical to the treatment of acute congestive heart failure(CHF).Multimodal masked autoencoders for vision-language pre-training have been shown to effectively fuse multimodal information from chest radiographs and pulmonary edema radiology reports to improve pulmonary edema quantification accuracy.However,existing methods randomly perform masking operations on images and text,and this unstable operation easily causes the model to ignore image lesions and text keywords,and hinders the fusion and alignment of multimodal information,which ultimately affects the quantization accuracy.To address the above problems,this research designs a masked vision-language distillation model,which introduces self-distillation into the medical vision-language pre-training task for the first time,so that the model obtains more stable and reliable medical image and linguistic representations;and optimizes the cross-modal attention fusion mechanism,so that the model better fuses and aligns the multimodal information.Compared with residual network 101(ResNet101),vision transformer(ViT)-B/16,joint modeling of chest radiographs and radio-logy reports for pulmonary edema assessment(JMC3R),and multi-modal masked autoencoders for medical vision and language pre-training(M3AE),our method obtains higher pulmonary edema quantification accuracy on pulmonary edema assessment dataset(PEAD).
作者 卢得民 钟诚 杨锋 LU Demin;ZHONG Cheng;YANG Feng(School of Computer,Electronics and Information,Guangxi University,Nanning,530004;Laboratory of Parallel,Distributed and Intelligent Computing of Guangxi Universities and Colleges,Nanning,530004)
出处 《基因组学与应用生物学》 CAS CSCD 北大核心 2024年第2期274-283,共10页 Genomics and Applied Biology
基金 国家自然科学基金项目(61861004,61962004)资助。
关键词 肺水肿 自蒸馏 掩码建模 多模态 注意力机制 Pulmonary edema Self-distillation Mask modeling Multimodal Attention mechanism
  • 相关文献

参考文献3

二级参考文献10

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部