期刊文献+

基于集成卷积神经网络和Vit的眼底图像分类研究 被引量:6

Fundus Image Classification Research Based on Ensemble Convolutional Neural Network and Vision Transformer
原文传递
导出
摘要 在眼底图像的分类任务中,卷积神经网络(CNN)的应用较为普遍,但随着Transformer应用的推进,Vit(Vision Transformer)模型在医学图像的领域上展现了更高的性能。然而Vit模型通常需要在大型数据集上预训练,受医学图像获取成本较高的限制。因此,本文提出一种基于EfficientNet-Vit集成模型的眼底图像分类方法,此方法将卷积神经网络模型EfficientNetV2-S和Vit模型相结合,分别使用两种完全不同的方法提取眼底图像的特征,通过自适应加权融合算法计算得到最优加权因子0.6和0.4,利用加权软投票法进行模型集成,从而获得更好的分类结果。实验证明,相比于集成前,集成后的模型分类准确率分别提高了0.5%和1.6%。 Objective With the increasing prevalence and blindness rate of fundus diseases,the lack of ophthalmologist resources is increasingly unable to meet the demand for medical examination.Given the shortage of ophthalmic medical staff,long waiting process for medical treatment,and challenges in remote areas,there is an irresistible trend to reduce the workload of medical staff via artificial intelligence.Several studies have applied convolutional neural network(CNN)in the classification task of fundus diseases;however with the advancement of Transformer model application,Vision Transformer(ViT)model has shown higher performance in the field of medical images.ViT models require pretraining on large datasets and are limited by the high cost of medical image acquisition.Thus,this study proposes an ensemble model.The ensemble model combines CNN(EfficientNetV2-S)and Transformer models(ViT).Compared with the existing advanced model,the proposed model can extract the features of fundus images in two completely different ways to achieve better classification results,which not only have high accuracy but also have precision and sensitivity.Specifically,it can be used to diagnose fundus diseases.This model can improve the work efficiency of the fundamental doctor if applied to the medical secondary diagnosis process,thus effectively alleviating the difficulties in diagnosis of fundus diseases caused by the shortage of ophthalmologist staff,long medical treatment process,and difficult medical treatment in remote areas.Methods We propose the EfficientNet-ViT ensemble model for the classification of fundus images.This model integrates the CNN and Transformer models,which adopt the EfficientNetV2-S and ViT models,respectively.First,train the EfficientNetV2-S and ViT models.Then,apply adaptive weighting data fusion technology to accomplish the complementation of the function of the two types of models.The optimal weighting factors of the EfficientNetV2-S and ViT models are calculated using the adaptive weighting algorithm and then the new model(EfficientNet-ViT)is integrated with them.After calculating the weighting factors 0.4and 0.6,multiply the output of the ViT model by a weighting factor of 0.4,multiply the output of the EfficientNetV2-S model by a weighting factor of 0.6,and then weigh the two to obtain the final prediction result.According to clinical statistics,the current common fundamental disease in my country includes the following diseases:diabetic retinopathy(DR),age-related macular degeneration(ARMD),cataract,and myopia.These fundus diseases are the main factors that cause irreversible blindness in my country.Thus,we classify fundus images into the following five categories:normal,DR,ARMD,myopia,and cataract.Furthermore,we use three indicators,such as accuracy,precision,and specificity.The EfficientNet-ViT ensemble model can extract the features of fundus images in two completely different ways to achieve better classification results and higher accuracy.Finally,we compare the performance indicators of this model and other models.The superiority of the integrated model in the fundus classification is verified.Results and Discussions The accuracy of EfficientNet-ViT ensemble model in fundus image classification reaches92.7%,the precision is 88.3%,and the specificity reaches 98.1%.Compared with EfficientNetV2-S and ViT models,the precision of EfficientNet-ViT ensemble model improves by 0.5%and 1.6%,accuracy improves by 0.7%and 1.9%,and specificity increases by 0.6%and 0.9%,respectively(Table 3).Compared with Resnet50,Densenet121,ResNeSt-101,and EfficientNet-B0,the accuracy of the EfficientNet-ViT ensemble model increases by 5.4%,3.2%,2.0%,1.4%,respectively(Table 4),showing its superiority in the fundus image classification task.Conclusions The EfficientNet-ViT ensemble model proposed in this study is a network model combining a CNN and a transformer.The core of the CNN is the convolution kernel,which has inductive biases,such as translation invariance and local sensitivity,and can capture local spatio-temporal information but lacks a global understanding of the image itself.Compared with the CNN,the self-attention mechanism of the transformer is not limited by local interactions and can not only mine long-distance dependencies but also perform parallel computation.This study uses the EfficientNetV2-S and ViT models to calculate the most weighted factors for the CNN and Transformer models through the adaptive weighted fusion method.The EfficientNet-ViT can extract image features in two completely different ways.Our experimental results show that the accuracy and precision of fundus image classification can be improved by integrating the two models.If applied in the process of medical auxiliary diagnosis,this model can improve the work efficiency of fundus doctors and effectively alleviate the difficulties in diagnosis of fundus diseases caused by the shortage of ophthalmic medical staff,long waiting process for medical treatment,and difficult medical treatment in remote areas in China.When more datasets are used to train the model in the future,the accuracy,precision,and sensitivity of automatic classification may be further improved to achieve better clinical results.
作者 袁媛 陈明惠 柯舒婷 王腾 何龙喜 吕林杰 孙好 刘健南 Yuan Yuan;Chen Minghui;Ke Shuting;Wang Teng;He Longxi;LüLinjie;Sun Hao;Liu Jiannan(Shanghai Engineering Research Center of Interventional Medical,Ministry of Education of Medical Optical Engineering Center,School of Health Sciences and Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处 《中国激光》 EI CAS CSCD 北大核心 2022年第20期102-110,共9页 Chinese Journal of Lasers
基金 上海市科委产学研医项目(15DZ1940400)。
关键词 生物光学 眼科学 眼底疾病 图像分类 集成模型 加权融合 bio-optics ophthalmology fundus disease image classification ensemble model weighted fusion
  • 相关文献

参考文献9

二级参考文献131

  • 1Cheng S C. Huang Y M. A novel approach to diagnose diabetes based on the fractal characteristics of retinal images[J]. IEEE Transactions on Information Technology in Biomedicine. 2003. 7(3): 163-170.
  • 2Preet hi M. Vanithamani R. Review of retinal blood vessel detection methods for automated diagnosis of diabetic retinopathy[C] //Proceedings of International Conference on Advances in Engineering. Science and Management. Los Alamitos: IEEE Computer Society Pre". 20]2: 262-265.
  • 3Chaudhuri S. Chatterjee S. Katz N. et al , Detection of blood vessels in retinal images using two-dimensional matched filters[J]. IEEE Transactions on Medical Imaging ? .\989. 8 (3): 263-269.
  • 4Fraz M M, Remagnino r. Hoppe A, et al. Blood vessel segmentation methodologies in retinal images-a survey[J]. Computer Methods and Programs in Biomedicine, 2012, 108 (1): 407-433.
  • 5Niemeijer M, StaalJ, van Ginneken B, et al . Comparative study of retinal vessel segmentation methods on a new publicly available database[C] //Proceedings of SPIE. Bellingham: Society of Photo-Optical Instrumentation Engineers, 2004, 5370: 648-656.
  • 6StaalJ, Abramoff M D, Niemeijer M, et al . Ridge-based vessel segmentation in color images of the retina[J]. IEEE Transactions on Medical Imaging, 2004, 23(4): 501-509.
  • 7SoaresJ V B, LeandroJ J G, Cesar R M, et al . Retinal vessel segmentation using the 2 -D Gabor wavelet and supervised classification[J]. IEEE Transactions on Medical Imaging, 2006,25(9): 1214-1222.
  • 8Ricci E, Perfetti R. Retinal blood vessel segmentation using line operators and support vector classification[J]. IEEE Transactions on Medical Imaging, 2007, 26(10): 1357-1365.
  • 9Osareh A, Shadgar B, Markham R. A computational?intelligence-based approach for detection of exudates in diabetic retinopathy images[J]. IEEE Transactions on Information Technology in Biomedicine, 2009, 13 (4): 535- 545.
  • 10Lupascu C A, Tegolo D, Trucco E. FABC: retinal vessel segmentation using AdaBoost[J]. "IEEE Transactions on Information Technology in Biomedicine, 2010, 14(5): 1267- 1274.

共引文献668

同被引文献48

引证文献6

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部