摘要
Relief算法为系列特征选择方法,包括最早提出的Relief算法和后来拓展的ReliefF算法,核心思想是对分类贡献大的特征赋予较大的权值;特点是算法简单,运行效率高,因此有着广泛的应用。但直接将Relief算法应用于有干扰的数据集或不平衡数据集,效果并不理想。基于Relief算法,提出一种干扰数据特征选择算法,称为阈值-Relief算法,有效消除了干扰数据对分类结果的影响。结合K-means算法,提出两种不平衡数据集特征选择算法,分别称为K-means-ReliefF算法和K-means-Relief抽样算法,有效弥补了Relief算法在不平衡数据集上表现出的不足。实验证明了本文算法的有效性。
Relief algorithm is a series of feature selection method. It includes the basic principle of Relief algorithm and its later extensions reliefF algotithm. Its core concept is to weight more on features that have essential contributions to classification. Relief algorithm is simple and efficient, thus being widely used. However, algorithm performance is not satisfied when applying the algorithm to noisy and unbal- anced datasets. In this paper, based on the Relief algorithm, a feature selection method is proposed, called threshold-Relief algorithm, which eliminates the influence of noisy data on classification results. Combining with the K-means algorithm, two unbalanced datasets feature selection methods are pro- posed, called K-means-ReliefF algorithm and K-means-relief sampling algorithm, respectively, which can compensate for the poor performance of Relief algorithm in unbalanced datasets. Experiments show the effectiveness of the proposed algorithms.
出处
《数据采集与处理》
CSCD
北大核心
2016年第4期838-844,共7页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(61273294)资助项目
山西省科技基础条件平台(2014091004-0104)资助项目