摘要
提出了一种基于网格技术的高维大数据集离群点挖掘算法(OMAGT)。该算法针对高维大数据集的分布特性,首先采用基于网格技术的方法寻找出聚类区域,并删除聚类区域内不可能成为离群点的聚类点集,然后运用局部离群因子(LOF)算法对剩下的点集进行离群点挖掘。OMAGT算法较好地实现了聚类信息的动态释放,将保留的离群点挖掘信息控制在一定的内存容量范围内,提高了算法的时间效率和空间效率。理论分析与实验结果表明OMAGT算法是可行和有效的。
An algorithm of outliers mining based on grid techniques in high dimension large dataset called Outliers Mining Algorithm based on Grid Techniques (OMAGT) was proposed. Focusing on the distributing characteristics of high dimension large dataset, clustering regions were found out by using the way based on grid techniques, moreover, those clustering dataset unable to turn into outliers in clustering region were deleted. Then outliers mining was done using algorithm Local Outlier Factor (LOF) in the remaining datasets. In the algorithm OMAGT, dynamical release of clustering information was preferably carried out. Thus, information of reserved outliers mining was restricted in limited memory capacitance, so both time efficiency and space efficiency were improved. Results in both theory analyses and experiments show that this algorithm is feasible and efficient.
出处
《计算机应用》
CSCD
北大核心
2007年第10期2369-2371,2382,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(70371015)
关键词
数据挖掘
离群点
网格
聚类区域
data mining
outlier,s
grid
clustering region