XGBoost算法在二分类非平衡高维数据分析中的应用被引量：5

Application of XGBoost to the Analysis of Class-imbalanced High-dimensional Omics Data

下载PDF

导出

摘要目的探讨XGBoost算法在二分类高维非平衡数据中的分类判别效果。方法通过模拟实验及真实代谢组学数据分析,对XGBoost、随机森林、支持向量机、随机欠采样以及随机梯度提升树共五种方法进行比较。结果模拟实验显示,XGBoost算法在数据非平衡较明显时,在各种实验条件下均优于或不劣于其他四种算法,在数据类别趋于平衡的情况下也同样具有较好的分类效果,且对噪声变量具有一定的抗干扰能力。实例分析显示,与其他四种算法相比,XGBoost算法的分类性能最优,且在保证分类效果的基础上具有更快的运算速度。结论 XGBoost算法适用于非平衡高维数据的判别分析,值得研究。 Objective To explore the performance of classification by XGBoostmodel in the case of Class-imbalanced High-dimensional Omics Data.Methods XGBoost was compared withRF,SVM,random under-samplingand SGBT by analysis of simulation experiments and actual metabolomics data.Results Simulation experiments showed that XGBoost is superior to the other four algorithms under various experimental conditions when the data is obviously class-imbalanced,it also has good classification effect when the data are nearly balanced,and has anti-interference ability to noise variables.Actual data showed that compared with the other four algorithms,XGBoost has the best classification performance and faster calculation speed on the basis of ensuring the classification effect.Conclusion XGBoost is suitable for discriminant analysis of class-imbalanced high dimensional omics data,and is worthwhile to further research.

作者卢娅欣黄月李康 Lu Yaxin;Huang Yue;Li Kang(Department of Medical Statistics,Harbin Medical University(150081),Harbin)

机构地区哈尔滨医科大学卫生统计学教研室

出处《中国卫生统计》 CSCD 北大核心 2021年第1期21-24,共4页 Chinese Journal of Health Statistics

基金国家自然科学基金(81973149,81773551)。

关键词极端梯度提升算法高维组学数据分类判别 XGBoost High dimensional omics data Classification

分类号 R195.1 [医药卫生—卫生统计学]

引文网络
相关文献

参考文献1

1贾慧珣,刘晋,李康.Boosting方法在高维数据分析中的应用[J].中国医院统计,2011,18(1):1-5. 被引量：3

二级参考文献12

1Hibi K,Goto T,Mizukami H,et al.Demethylation of the CDH3 gene is frequently detected in advanced colorectal cancer[J].Anticancer Res,2009,29(6):2215-2221.
2Imai K,Hirata S,Irie A,et al.Identification of a novel tumor-associated antigen,cadherin 3/P-cadherin,as a possible target for immunotherapy of pancreatic,gastric,and colorectal cancers[J].Clin Cancer Res,2008,14(20):6487-6581.
3Cheung LW,Leung PC,Wong AS.Cadherin switching and activation of p120 catenin signaling are mediators of gonadotropin-releasing hormone to promote tumor cell migration and invasion in ovarian cancer[J].Oncogene,2010,29(16):2427-2466.
4Mlakar V,Berginc G,Volavsek M,et al.Presence of activating KRAS mutations correlates significantly with expression of tumour suppressor genes DCN and TPM1 in colorectal cancer[J].BMC Cancer,2009,9(1):282-290.
5Tu LC,Yan X,Hood L,et al.Proteomics analysis of the interactome of N-myc downstream regulated gene 1 and its interactions with the androgen response program in prostate cancer cells[J].Mol Cell Proteomics,2007,6(4):575-662.
6Jerome F,Trevor H,Robert T.Additive Logistic regression:a statistical view of boosting[J].The annals of Statistics,2000,28(2):337-374.
7Valdehita A,Carmena MJ,Collado B,et al.Vasoactive intestinal peptide (VIP) increases vascular endothelial growth factor (VEGF) expression and secretion in human breast cancer cells[J].Regul Pept,2007,144(1-3):101-108.
8Sastry KS,Smith AJ,Karpova Y,et al.Diverse antiapoptotic signaling pathways activated by vasoactive intestinal polypeptide,epidermal growth factor,and phosphatidylinositol 3-kinase in prostate cancer cells converge on BAD[J].J Biol Chem,2006,281(30):20891-21791.
9Jiang W,Li X,Rao S,et al.Constructing disease-specific gene networks using pair-wise relevance metric:application to colon cancer identifies interleukin 8,desmin and enolase 1 as the central elements[J].BMC Syst Biol,2008,2(1):72-86.
10Council L,Hameed O.Differential expression of immunohistochemical markers in bladder smooth muscle and myofibroblasts,and the potential utility of desmin,smoothelin,and vimentin in staging of bladder carcinoma[J].Mod Pathol,2009,22(5):639-688.

共引文献2

1章光明,刘晋,贾慧珣,李康.随机梯度boosting算法在代谢组学研究中的应用[J].中国卫生统计,2013,30(3):323-326. 被引量：6
2吴美京,吴骋,王睿,赵艳芳,贺佳.倾向性评分法中评分值的估计方法及比较[J].中国卫生统计,2013,30(3):440-444. 被引量：27

同被引文献23

1尹昌浩,郭艳芹,韩璎.轻度认知障碍的研究进展[J].医学研究生学报,2012,25(9):977-980. 被引量：24
2马小石,李红冀,孟娜,苏俊霖.基于支持向量机的钻井液配方优选与成本控制[J].石油化工应用,2014,33(6):4-8. 被引量：1
3季燕.急性脑卒中患者发生上消化道出血临床分析[J].中外医学研究,2016,14(34):135-136. 被引量：3
4陈斯鹏,高妮,田思佳,张凤,郭秀花.基于MR图像纹理特征的阿尔茨海默病分类模型[J].北京生物医学工程,2017,36(2):134-138. 被引量：6
5李建,蔡海艳,李嘉迪.改进遗传算法及其在钻井液设计中的运用[J].西南石油大学学报（自然科学版）,2019,41(1):165-174. 被引量：2
6谷鸿秋,周支瑞,章仲恒,周权.临床预测模型:基本概念、应用场景及研究思路[J].中国循证心血管医学杂志,2018,10(12):1454-1456. 被引量：67
7宋其兰.急性脑卒中患者并发消化道大出血的危险因素调查分析[J].护理实践与研究,2020,17(5):86-87. 被引量：3
8无,胡盛寿,韩雅玲,蔡军,孙英贤,李玉明,张伟丽,卜培莉,陈芳,陈晓平,陈有仁,崔兆强,范超群,冯磊,冯雪,冯颖青,高超,郭建军,郭子宏,姜一农,李静,李萍,李伟,梁立荣,刘蔚,马文君,马云,牟建军,庞宇,齐玥,任明,宋崇升,孙刚,陶军,田刚,汪道文,汪芳,王梅,王伊龙,吴寿岭,谢良地,徐新娟,阎浩,杨宁,杨月欣,尹新华,于仁文,余静,袁洪,岳伟华,曾春雨,曾强,张坚,张亮清,赵冬,赵慧辉,郑哲,周脉耕,王璐,刘佩玉.中国高血压健康管理规范(2019)[J].中华心血管病杂志,2020,48(1):10-46. 被引量：256
9王馨.脑卒中后并发症的影响因素及防治进展[J].中国社区医师,2020,36(10):4-5. 被引量：4
10方侠旋.基于XGBoost模型的文本多分类研究[J].网络安全技术与应用,2020(6):50-52. 被引量：3

引证文献5

1张嘉嘉,易付良,杨慧,陈杜荣,秦瑶,崔靖,白文琳,韩红娟,葛晓燕,余红梅.基于Bagging的阿尔茨海默病进程多分类预测研究[J].中国卫生统计,2022,39(5):675-679.
2凡如,许碧云,焦志刚,臧一腾,陈思臻,陈炳为,周卫红.基于极端梯度提升算法的高血压识别模型建立[J].中国卫生统计,2023,40(1):74-77.
3易付良,陈杜荣,秦瑶,张嘉嘉,韩红娟,葛晓燕,崔靖,白文琳,安建华,余红梅.基于神经心理测试的XGBoost在MCI亚型分类中的应用[J].中国卫生统计,2023,40(4):516-521.
4刘爽,王子尧,邓宇含,汪雨欣,黄馨莹,李子孝,刘宝花,姜勇.基于机器学习方法构建缺血性卒中患者发生院内消化道出血的预测模型[J].中国卫生统计,2023,40(6):846-851.
5花露露,曹晓春,王劲草,王金,焦昱璇.基于XGBoost的钻井液体系分类预测模型研究[J].钻井液与完井液,2023,40(6):765-770. 被引量：1

二级引证文献1

1王振东.基于贝叶斯优化RF-BiLSTM的盾构机掘进速度预测的研究[J].中国建材科技,2023,32(6):142-146.

1赵娟娟,叶顺,徐可,陈栋骅,岳宝华,李敏杰,刘太昂,陆文聪.基于提取不同中红外光谱特征信息的烟叶部位判别研究[J].河南师范大学学报（自然科学版）,2021,49(1):45-49. 被引量：6
2苏建宁,沈宇浩,杨文瑾,张书涛.基于认知思维和蛛网结构的产品形态创新设计研究[J].包装工程,2021,42(2):14-21. 被引量：3

中国卫生统计

2021年第1期

浏览历史

内容加载中请稍等...

XGBoost算法在二分类非平衡高维数据分析中的应用被引量：5

参考文献1

二级参考文献12

共引文献2

同被引文献23

引证文献5

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

XGBoost算法在二分类非平衡高维数据分析中的应用 被引量：5

参考文献1

二级参考文献12

共引文献2

同被引文献23

引证文献5

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

XGBoost算法在二分类非平衡高维数据分析中的应用被引量：5