摘要
k近邻粗糙集作为邻域粗糙集的拓展,被广泛应用于知识发现等领域.k近邻粗糙集模型的粒度构建是选取最近的k个样本.然而,传统k近邻粒度不能有效处理样本分布不均匀的数据.此外,单向粒度构建方法也会导致部分离群点被归入到粒度模型中,增加了粒度的不确定性.为了解决上述问题,提升粒度模型的稳定性,本文提出了一种融合稀疏约束的双向k近邻粗糙集模型.首先,通过稀疏约束模型刻画样本之间联系,选取紧密关联的样本构造稀疏双向k近邻粒度.然后,基于双向互邻信息策略,剔除模型中不符合该策略的样本.最后,通过条件熵与互信息熵刻画粒度的不确定性程度.UCI数据集的实验结果证明,本文提出的融合稀疏约束的双向k近邻粗糙集模型能够降低信息的不确定性,也为k近邻粗糙集模型的改进提供了新的方向.
As an extension of neighborhood rough set,k-nearest neighborhood-based rough set is widely employed in knowledge discovery and other fields.The granularity-based construction of the traditional k-nearest neighborhood-based rough set model is to select the nearest k samples.However,the classical k-nearest neighborhood-based information granule cannot effectively deal with data of different density distributions.In addition,the one-side granularity-based construction method also causes outliers to be classified into the granule,which increases the uncertainty of granule.In order to address the above issues and improve the stability of granule,this paper proposed a mutual k-nearest neighborhood-based rough set model fusing with sparsity constraint function.First,the relationship between samples was characterized by the sparse constraint function,and the closely related samples were selected to construct the sparse mutual k-nearest neighborhood-based granule.Then,we eliminated samples in the model that do not conform to the mutual information strategy.Finally,the uncertainty degree of granule was characterized by conditional entropy and mutual information entropy.The experimental results on the UCI dataset show that the proposed mutual k-nearest neighborhood-based rough set model fusing with sparsity constraint can reduce the uncertainty of information.It also provides a new direction for the improvement of the k-nearest neighborhood-based rough set model.
作者
樊晓雪
尹涛
陆杨
鞠恒荣
丁卫平
FAN Xiaoxue;YIN Tao;LU Yang;JU Hengrong;DING Weiping(School of Information Science and Technology,Nantong University,Nantong 226019,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第10期2370-2377,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62006128,61976120)资助
江苏省自然科学基金项目(BK20191445)资助
江苏省双创博士计划项目((2020)30986)资助
江苏省研究生科研与实践创新计划项目(SJCX21_1447)资助.
关键词
k近邻粗糙集
稀疏约束
双向策略
条件熵
互信息熵
k-nearest neighborhood rough set
sparsity constraint function
mutual strategy
conditional entropy
mutual information entropy