摘要
阐述邻域粗糙集和邻域信息熵的基本定义及性质,为避免数值属性信息系统属性约简过程中,属性离散化造成特征信息的丢失,提出一种新的基于邻域信息熵度量数值属性约简算法。扩展邻域信息系统核属性集生成约简属性集,邻域信息熵度量不仅关注约简属性集正域变化,而且考察负域样本空间约简属性邻域等价类在决策属性划分的分布,具备更好的邻域关系度量细粒度。实验表明,对比邻域粗糙集近似度量、邻域有效信息率度量、邻域软间隔度量的属性约简方法,该算法能有效进行邻域信息系统属性约简的同时,也保持了约简属性集更好的分类精度。
The paper elaborates the basic definitions and properties of neighborhood rough sets and neighborhood entropy. To avoid losing feature information caused by diseretization of continuous attri- butions while reducing attributions, we present a new algorithm of continuous attributions reduction based on neighborhood entropy-based measurement. In the process of expending from core attribute sets to the reduction of attribute sets in neighborhood information system (NIS), neighborhood entropybased measurement is not only concerned with the positive field change of the reduction of attribute sets, but examines the distribution characteristics of the neighborhood equivalence classes of sample space in negative field in the decision attribute partition, which possess the finer granularity in the measurement of neighborhood relationship. Experimental results with UCI standard datasets show that compared with those attributions reduction algorithms based on neighborhood approximation measurement, neighborhood effective information ratio measurement, and neighborhood soft margin measurement, the proposed algorithm can effectively reduce continuous attributions in NIS, and at the same time, it maintains better classification accuracy of the reduction of attribute sets.
出处
《计算机工程与科学》
CSCD
北大核心
2016年第2期350-355,共6页
Computer Engineering & Science
基金
国家自然科学基金委创新群体项目(70921001)
中国移动通信集团业务支撑重点联合研发项目(2014_LH_21)
关键词
属性约简
邻域信息熵度量
核属性
邻域信息系统
负域样本空间
分类精度
attribute reduction
neighborhood entropy-based measurement
core attribute
neighborhood information system
sample space in negative field
classification accuracy