期刊文献+

基于最大信息系数和迭代式XGBoost的混合特征选择方法 被引量:2

HYBRID FEATURE SELECTION METHOD BASED ON MAXIMUM INFORMATION COEFFICIENT AND ITERATIVE XGBOOST
下载PDF
导出
摘要 中医药物质基础实验数据往往呈现特征维数较高、样本较少的特点,且该数据还存在较多的无关信息和冗余信息,给深入挖掘中医药物质信息带来了挑战。提出基于最大信息系数和迭代式XGBoost的混合特征选择方法,利用最大信息系数度量每维特征与目标变量间的相关性,并且按照某种评价准则实现无关特征的过滤和候选特征子集的获取;将候选子集进行排序与划分,依次采用XGBoost方法迭代剔除冗余特征,从而得到有效特征子集。实验结果表明,该方法能够选出数量较少且解释性较强的特征,且对中医药物质基础实验数据有较好的适应性。 Traditional Chinese medicine(TCM) basic experiments data often show the characteristics of higher feature dimensions and fewer samples, and the data still has more irrelevant information and redundancy, which has brought challenges to digging deeper into the information of Chinese medicine substances. This paper proposes a hybrid feature selection method based on maximum information coefficient and iterative XGBoost. This method used the maximum information coefficient to measure the correlation between each dimension feature and the target variable, implemented filtering for irrelevant features according to some evaluation criteria and obtained feature subsets. The candidate subsets were sorted and divided, and the XGBoost method was used to iteratively remove redundant features in order to obtain effective feature subsets. The experimental results show that the new method can select a small number of features with strong interpretation, and it has good adaptability to the experimental data of the basic materials of TCM.
作者 熊玲珠 邱伟涵 罗计根 李科定 Xiong Lingzhu;Qiu Weihan;Luo Jigen;Li Keding(College of Computer Science,Jiangxi University of Chinese Medicine,Nanchang 330004,Jiangxi,China;South China Normal University,Guangzhou 510631,Guangdong,China;Xiamen Xian Yue Hospital,Xiamen 361012,Fujian,China)
出处 《计算机应用与软件》 北大核心 2023年第1期280-286,305,共8页 Computer Applications and Software
基金 国家自然科学基金项目(61363042,61562045,61762051) 江西省重点研发计划重点项目(20171ACE50021) 江西省科技厅科学技术研究项目(GJJ190683) 江西省研究生创新专项资金项目(YC2018-S281)。
关键词 高维小样本 特征选择 MIC 迭代式XGBoost 中医药信息 High dimensional small sample Feature selection MIC Iterative XGBoost TCM information
  • 相关文献

参考文献9

二级参考文献53

  • 1李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45
  • 2毛勇,皮道映,刘育明,孙优贤.Accelerated Recursive Feature Elimination Based on Support Vector Machine for Key Variable Identification[J].Chinese Journal of Chemical Engineering,2006,14(1):65-72. 被引量:4
  • 3RICHARDAJ,WICHERNDW.实用多元统计分析[M].陆璇,译.北京:清华大学出版社,2008.
  • 4崔自峰,徐宝文,张卫丰,徐峻岭.一种近似Markov Blanket最优特征选择算法[J].计算机学报,2007,30(12):2074-2081. 被引量:15
  • 5Kim Y S, Street W N, Menczer F. Data mining: opportunities and challenges[M] . Hershey: Idea Group Publishing ,2003.
  • 6Saeys Y, Inza I, Larrafiaga P. A review of feature selection techniques in bioinformatics[J]. Bioinfonuatics,2007 ,23(19) :2507-2517.
  • 7Wang Y H,Makedon F S,FordJ C. PearlmanJ. HykGenej a hybrid approach for selecting marker genes for phenotype classification u?sing microarray gene expression data[J]. Bioinformatics, 2005,21 (8) : 1530-1537.
  • 8Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999 ,286(5439) :531-537.
  • 9Robnik Sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RreliefF[J]. Machine Learning ,2003,53 (1-2) :23- 69.
  • 10Hanczar B, Courtine M, Benis A, et al. Improving classification of microarray data using prototype-based feature selection[J] . ACM SIGKDD Explorations Newsletter ,2003,5 (2) :23-30.

共引文献160

同被引文献21

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部