摘要
特征选择算法是微阵列数据分析的重要工具,特征选择算法的分类性能和稳定性对微阵列数据分析至关重要。为了提高特征选择算法的分类性能和稳定性,提出一种面向高维微阵列数据的集成特征选择算法来弥补单个基因子集信息量的不足,提高基因特征选择算法的分类性能和稳定性。该算法首先采用信噪比方法选择若干区分基因;然后对每个区分基因利用条件信息相关系数评估候选基因与区分基因的相关性,生成多个相关基因子集,最后,通过集成学习技术整合多个相似基因子集。实验结果表明,本文提出的集成特征选择算法的分类性能以及稳定性在多数情况下均优于只选择单个基因子集的方法。
Feature selection algorithms are an important tool for mieroarray data analysis, thus their classification ability and stability are essential for data analysis. We propose an ensemble feature selection algorithm for high dimensional microarray data to compensate for the lack of information on a single gene subset. We firstly adopt the signal noise ratio method to select discriminative genes, and then generate relevant gene subsets by evaluating the correlation between the candidate gene and discriminative gene through conditional correlation coefficients. We finally integrate resemblant gene subsets through the ensemble learning technology. Experimental results show that in most cases the classification ability and stability of the proposed algorithm is superior to those that select only a single gene subset.
出处
《计算机工程与科学》
CSCD
北大核心
2016年第7期1330-1337,共8页
Computer Engineering & Science
基金
国家自然科学基金(51174257/F030504)
中央高校基本科研业务费专项资金(2013BHZX0040)
安徽省级科研机构委托专项重点项目(2013WLGH01ZD)
关键词
微阵列数据
信噪比
条件相关系数
特征选择
microarray data
signal noise ratio
conditional correlation coefficient
feature selection