摘要
随着机器学习和生物信息学的快速发展,癌症亚型分类成为当前研究热点之一.根据亚型的分类,可以指导癌症的治疗和预后.近年来,许多监督学习方法被用于癌症亚型分类.考虑到高维、样本数量少和数据不均衡等特点,本文首先利用LDA进行降维,其次利用SMOTE算法均衡数据,再利用Extra-Trees模型对癌症亚型进行分类,最后基于TCGA中9种癌症25种癌症亚型的3296个样本来验证模型的有效性.实验结果表明,利用给出的模型进行癌症亚型分类具有很好的效果.
With the rapid development of machine learning and bioinformatics,cancer subtype classification has become one of the current research hotspots.According to the classification of subtypes,cancer treatment and prognosis can be guided.In recent years,many supervised learning methods have been used to classify cancer subtypes.Considering the characteristics of high dimension,small number of samples and data imbalance,this paper uses LDA to reduce dimension,uses SMOTE algorithm to balance data,and uses Extra-Trees model to classify cancer subtypes.In the experiments,3296 samples of 9 cancers and 25 Cancer subtypes in TCGA were used to verify the validity of the model.The experimental results show the good performance of using the proposed model to obtain the classification results of cancer subtypes.
作者
段华
黄军帅
张珊
DUAN Hua;HUANG Junshuai;ZHANG Shan(College of Mathematics and Systems Science,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)
出处
《数学建模及其应用》
2021年第3期23-29,共7页
Mathematical Modeling and Its Applications
基金
国家自然科学基金(61202152)
山东省科技发展项目(ZR202102230289,ZR202102250695,ZR2019LZH001,ZR2017MF027)
山东省泰山学者计划专项
山东科技大学科研创新团队支持计划项目(2015TDJH102)
优秀教学团队建设计划(JXTD20180505)。