摘要
针对肿瘤基因数据具有高维小样本的特性,为了提高传统基因分类方法的正确率,提出一种结合随机森林和邻域粗糙集的特征基因选择方法(Random Forest and Neighborhood Rough Set,RFNRS).该方法首先利用Relief算法,对原始的肿瘤基因数据进行权重选择,去除权重较低的特征子集;接着引入基于随机森林的封装式特征选择算法(Random Forest Wrapper Feature Select,RFWFS),以模型准确率作为评判准则,筛选特征子集;然后引入邻域粗糙集针对连续性的特征子集进行寻优处理;最后利用多个经典分类算法处理特征子集.经实验结果表明,该方法不仅在肿瘤基因特征子集的选择上具有良好的性能,同时在算法的分类性能上也有所提高.
To the question which tumor gene dataset had the characteristic of high dimension and small sample, to improve the accuracy of traditional method in genetic classification, the feature gene selection method combined random forest with neighborhood rough set was proposed. Firstly, the method took advantage of the Relief algorithm to select the weights from the original tumor gene dataset, and to remove the feature subsets with low weight. Secondly, the wrapper feature select algorithm based on random forest { RFWFS ) which has the accuracy as the evaluation criterion was introduced to choose the feature subsets. Then, the neighborhood rough set was intro- duced to deal with the continuous features and to find the primary feature subset. Finally, several classic classification algorithms were used to process the feature subsets. The experimental results showed that the method not only had good performance to choose the feature subsets on tumor gene dataset, but also the classification performance of classic classification algorithm was improved.
出处
《小型微型计算机系统》
CSCD
北大核心
2017年第6期1358-1362,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61163010)资助
甘肃省自然科学基金项目(1308RJZA111)资助
兰州市科技计划项目(2015-2-99)资助