摘要
目的探讨XGBoost算法在二分类高维非平衡数据中的分类判别效果。方法通过模拟实验及真实代谢组学数据分析,对XGBoost、随机森林、支持向量机、随机欠采样以及随机梯度提升树共五种方法进行比较。结果模拟实验显示,XGBoost算法在数据非平衡较明显时,在各种实验条件下均优于或不劣于其他四种算法,在数据类别趋于平衡的情况下也同样具有较好的分类效果,且对噪声变量具有一定的抗干扰能力。实例分析显示,与其他四种算法相比,XGBoost算法的分类性能最优,且在保证分类效果的基础上具有更快的运算速度。结论 XGBoost算法适用于非平衡高维数据的判别分析,值得研究。
Objective To explore the performance of classification by XGBoostmodel in the case of Class-imbalanced High-dimensional Omics Data.Methods XGBoost was compared withRF,SVM,random under-samplingand SGBT by analysis of simulation experiments and actual metabolomics data.Results Simulation experiments showed that XGBoost is superior to the other four algorithms under various experimental conditions when the data is obviously class-imbalanced,it also has good classification effect when the data are nearly balanced,and has anti-interference ability to noise variables.Actual data showed that compared with the other four algorithms,XGBoost has the best classification performance and faster calculation speed on the basis of ensuring the classification effect.Conclusion XGBoost is suitable for discriminant analysis of class-imbalanced high dimensional omics data,and is worthwhile to further research.
作者
卢娅欣
黄月
李康
Lu Yaxin;Huang Yue;Li Kang(Department of Medical Statistics,Harbin Medical University(150081),Harbin)
出处
《中国卫生统计》
CSCD
北大核心
2021年第1期21-24,共4页
Chinese Journal of Health Statistics
基金
国家自然科学基金(81973149,81773551)。