期刊文献+

逐层Transformer在类别不均衡数据的应用 被引量:1

Application of layer by layer Transformer in class-imbalanced data
下载PDF
导出
摘要 为解决临床医学量表数据类别不均衡容易对模型产生影响,以及在处理量表数据任务时深度学习框架性能难以媲美传统机器学习方法问题,提出了一种基于级联欠采样的Transformer网络模型(layer by layer Transformer,LLT)。LLT通过级联欠采样方法对多数类数据逐层删减,实现数据类别平衡,降低数据类别不均衡对分类器的影响,并利用注意力机制对输入数据的特征进行相关性评估实现特征选择,细化特征提取能力,改善模型性能。采用类风湿关节炎(RA)数据作为测试样本,实验证明,在不改变样本分布的情况下,提出的级联欠采样方法对少数类别的识别率增加了6.1%,与常用的NEARMISS和ADASYN相比,分别高出1.4%和10.4%;LLT在RA量表数据的准确率和F 1-score指标上达到了72.6%和71.5%,AUC值为0.89,mAP值为0.79,性能超过目前RF、XGBoost和GBDT等主流量表数据分类模型。最后对模型过程进行可视化,分析了影响RA的特征,对RA临床诊断具有较好的指导意义。 In order to solve the problem that class-imbalance data of clinical medical tables tend to have an impact on the model and that the performance of deep learning framework is difficult to match that of traditional machine learning methods when processing scale data tasks,this paper proposed a layer by layer Transformer(LLT)network model based on cascaded under-sampling.LLT deleted the most types of data layer by layer by cascade under-sampling method to achieve the balance of data categories and reduced the impact of class-imbalance data on the classifier.Moreover,LLT used attention mechanism to carry out correlation evaluation on the features of the input data to achieve feature selection,refined the feature extraction abi-lity and improved the model performance.This paper used RA(rheumatoid arthritis)data as test samples.Experimental results show that,on the premise of not changing the sample distribution,the recognition rate of a few categories is increased by 6.1%by the proposed cascade under-sampling method,which is 1.4%and 10.4%higher than that of the commonly used NEARMISS and ADASYN respectively.The accuracy of the RA tabular data and the F 1-score index of LLT reach 72.6%and 71.5%,the AUC value is 0.89,the mAP value is 0.79,and the performance exceeds the current mainstream tabular data classification models such as RF,XGBoost and GBDT.This paper also visualized the model process and analyzed the characteristics affecting RA.It has a good guiding significance for the clinical diagnosis of RA.
作者 杨晶东 李熠伟 江彪 姜泉 韩曼 宋梦歌 Yang Jingdong;Li Yiwei;Jiang Biao;Jiang Quan;Han Man;Song Mengge(School of Optical-Electrical&Computer Engineering,University of Shanghai for Science&Technology,Shanghai 200093,China;Guang’anmen Hospital,China Academy of Chinese Medical Science,Beijing 100053,China)
出处 《计算机应用研究》 CSCD 北大核心 2023年第10期3047-3052,共6页 Application Research of Computers
基金 国家自然科学基金资助项目(81973749) 中国中医科学院科技创新工程项目(CI2021A01503)。
关键词 量表数据分类 类别不均衡 级联欠采样 TRANSFORMER tabular data classification class-imbalance cascaded under-sampling Transformer
  • 相关文献

参考文献2

二级参考文献48

  • 1Zhen-Dong Xu, Hai-Tao Xu, Hong-Bin Yuan, Hao Zhang, Rui-Hua Ji, Zui Zou, Zhi-Ren Fu and Xue-Yin Shi Department of Anesthesiology, Huashan Hospital, Fudan University, Shanghai 200040, China ,Department of Anesthesiology ,Organ Transplantation Center , Changzheng Hospital, Second Military Medical University, Shanghai 200003, China.Postreperfusion syndrome during orthotopic liver transplantation:a single-center experience[J].Hepatobiliary & Pancreatic Diseases International,2012,11(1):34-39. 被引量:15
  • 2Xiao-Bo Chen,Ming-Qing Xu.Primary graft dysfunction after liver transplantation[J].Hepatobiliary & Pancreatic Diseases International,2014,13(2):125-137. 被引量:11
  • 3WU Xin-dong,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge and Information Systems,2008,14(1):1-37.
  • 4CHAWLA N V,JAPKOWICZ N,KOTCZ A.Editorial:special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.
  • 5HE Hai-bo,GARCIA E A.Learning from imbalanced data[J].IEEE Trans on Knowledge and Data Engineering,2009,21(9):1263-1284.
  • 6TING K M.A comparative study of cost-sensitive boosting algorithms[C]//Proc of the 17th International Conference on Machine Learning.2000:983-990.
  • 7FAN Wei,STOLFO S J,ZHANG Jun-xin,et al.AdaCost:misclassification cost-sensitive boosting[C]//Proc of the 16th International Conference on Machine Learning.1999:97-105.
  • 8SUN Yan-min,KAMEL M S,WONG A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378.
  • 9GALAR M,FERNNDEZ A,BARRENCHEA E,et al.EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern Recognition,2013,46(12):3460-3471.
  • 10JOSHI M V,KUMAR V,AGARWAL R C.Evaluating boosting algorithms to classify rare classes:comparison and improvements[C]//Proc of IEEE International Conference on Data Mining.Washington DC:IEEE Computer Society,2001:257-264.

共引文献74

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部