摘要
基于邻域粗糙集的特征选择模型受到邻域参数值的制约.为此,引入最大近邻确定样本的邻域,构造了基于最大近邻粗糙集模型.在此基础上,提出了基于最大近邻粗糙逼近的特征选择方法.首先计算样本到与它最近同类和最近异类样本的距离来确定近邻类的大小,其次分析最大近邻类的性质提出快速求解样本正域的方法,最后采用前向贪心搜索策略构造特征选择算法.该算法不仅避免了邻域参数值的不确定选择,而且降低了对样本正域的判断次数.在3个不同分类器和8个UCI数据集上的实验结果表明:该模型不仅能够选择较少的特征,而且有效地提高了分类性能.
Feature selection algorithm based on neighborhood rough sets is restricted by the neighborhood size. In this paper, a maximal nearest-neighbor is presented to estimate sample's neighborhood, and the maximal nearest-neighbor rough approximation model is constructed. Based on this model, a feature selection algorithm based on maximal nearest-neighbor rough approximation is proposed. The proposed algorithm first calculate the distance between the nearest missing and the nearest hit of a given sample to determine the size of nearest neighbor, and present a new fast method to calculate the positive region of the maximal nearest-neighbor model, then a forward greedy feature selection algorithm is constructed. This algorithm not only avoids the uncertainty of neighborhood size, but also reduces the number of judgment for positive region of sample. The experiment is conducted on three different classifiers and eight different datasets. Experimental results show that the proposed model selects a few features and effectively improve classification performance.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第8期1832-1836,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61303131
61379021)资助
福建省自然科学基金项目(2013J01028)资助
福建省教育厅科技项目(JA14192)资助
漳州市科技项目(ZZ2013J04)资助
闽南师范大学研究生科研创新基金项目(YJS201433)资助
关键词
特征选择
最大近邻
邻域粗糙集
feature selection
maximum nearest-neighbor
neighborhood rough sets