摘要
将高维数据投影在子空间中,是解决"维灾"的有效途径之一。从提高挖掘效率的角度,给出一种新的基于子空间的两阶段离群检测算法,利用密度阈值筛选候选离群对象减少计算量。该算法首先,计算每个数据对象在每一维的密度比,所有维的密度比乘积取对数平均作为密度系数,并选取候选离群对象;其次,候选离群对象的邻居在每一个关联子空间中的偏离程度之积作为偏差比,密度系数与偏差比的乘积作为离群系数,并确定离群数据对象。由于仅计算候选离群对象的离群系数,因此有效地提高挖掘效率;最后,采用UCI数据集,实验验证了该算法不仅保证挖掘结果精度,而且有效提高了挖掘效率。
Project high-dimensional data to subspace is one of the effective ways to solve the dimension disaster. From the perspective of improving the efficiency of mining, we propose a new two-stage outlier detection algorithm based on subspace using the density threshold to select potential from the group of objects so as to reduce amount of calculation. First, the algorithm calculates the density of each data object in each dimension, then calculates their product density coefficient, and then selects the candidates from the group of objects ; Second, the algorithm calcu- lates the deviation degree of the candidates' neighbors as the deviation ratio in each subspace, and takes two fac- tors' product as outlier coefficient to decide the outliers. Because only the candidate from the group of objects is calculated, thus effectively improving the efficiency of mining; Finally, the algorithm is verified by the experi- ments, it not only guarantees the accuracy of mining results, and effectively improves the mining efficiency using the UCI data sets.
出处
《太原科技大学学报》
2017年第2期88-92,共5页
Journal of Taiyuan University of Science and Technology
关键词
离群点检测
高维
投影
关联子空间
outliers detection, high-dimensional, projection, correlation subspace