摘要
孤立点检测是一个重要的知识发现任务,在分析基于距离的孤立点及其检测算法的基础上,文章提出了一个判定孤立点的新定义,并设计了基于抽样的近似检测算法,用实际数据进行了实验。实验结果表明,新的定义不仅与DB(p,d)孤立点定义有着相同的结果,而且简化了孤立点检测对用户的要求,同时给出了数据对象在数据集中的孤立程度。
Outlier detection is an important task in knowledge discovery.After analyzing distance-based outlier and the algorithms for detecting outliers,this paper proposes a new definition to judge outlier,and develops a sampling-based approximate detection algorithm.Experiments have been carried out with real data.The experimental results indicates that not only the newly definition get the same results as DB(p,d)'s but also the definition simplifies the requirement for detecting outliers.It points out the outlier's outlying degree in the dataset as well.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第33期73-75,94,共4页
Computer Engineering and Applications