摘要
离群点检测是数据挖掘领域的重要研究方向之一,其目的是找出数据集中与其他数据对象显著不同的一小部分数据。离群点检测在网络入侵检测、信用卡欺诈检测、医疗诊断等领域有着非常重要的应用。近年来,粗糙集理论被广泛用于离群点检测,然而,经典的粗糙集模型不能有效处理数值型数据。对此,本文利用邻域粗糙集模型来检测离群点,在邻域粗糙集中引入一种新的信息熵模型——邻域粒度熵。基于邻域粒度熵,提出一种新的离群点检测算法OD_NGE。实验结果表明,相对于已有的离群点检测算法,OD_NGE具有更好的离群点检测性能。
Outlier detection is one of the important research directions in the field of data mining.Its purpose is to find out a small portion of data in the data set that is significantly different from other data objects.Outlier detection has very important applications in the fields of network intrusion detection,credit card fraud detection,medical diagnosis and so on.Recently,rough set theory has been widely used in outlier detection.However,the classical rough set model can not effectively deal with the numerical and mixed data.Therefore,in this paper we employ the neighborhood rough set model to detect outliers,and introduce a new information entropy model——neighborhood granular entropy in neighborhood rough sets.Based on the neighborhood granularity entropy,a new outlier detection algorithm called OD_NGE is proposed.Experimental results show that OD_NGE has better outlier detection performance than the existing algorithms.
作者
段珣
杨志勇
江峰
DUAN Xun;YANG Zhi-yong;JIANG Feng(College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China)
出处
《计算机与现代化》
2022年第10期19-23,共5页
Computer and Modernization
基金
国家自然科学基金资助项目(61973180,61671261)
山东省自然科学基金资助项目(ZR2018MF007)。
关键词
离群点检测
邻域粗糙集
知识粒度
邻域粒度熵
数值型数据
outlier detection
neighborhood rough set
knowledge granularity
neighborhood granular entropy
numeric data