摘要
离群点是数据仓库中表现行为异常的数据。对高维空间下离群点的性质进行了研究,采用高维空间数据在低维空间投影再进行探测的策略,解决了高维空间数据稀疏难以用数据点距离判断离群的问题。算法实现中选取彼此关联紧密的维,数据点之间的距离采用最近邻定义,用基于密度的离群点探测方法,能在局部空间内更有效地探测到离群点。
Outlier attributes in high - dimensional space are researched in this paper. The strategy of projecting data in high -dimensional space to lower - dimensional space is represented, which resolves the problem of sparsity of the data in high-dimensional space. The arithmetic implementation is more practical and efficient in outlier detection,in which the closed - assciation dimensions are choosed,k- nearest neighbor definition and density-based outlier detection arithmetic are adopted.
出处
《现代电子技术》
2006年第15期67-69,共3页
Modern Electronics Technique
关键词
离群点探测
最近邻
高维空间
基于密度
数据挖掘
outlier detection
k - nearest neighbor
high - dimensional space
density - based
data mining