期刊文献+

基于网格技术的高维大数据集离群点挖掘算法 被引量:3

Algorithm of outliers mining based on grid techniques in high dimension large dataset
下载PDF
导出
摘要 提出了一种基于网格技术的高维大数据集离群点挖掘算法(OMAGT)。该算法针对高维大数据集的分布特性,首先采用基于网格技术的方法寻找出聚类区域,并删除聚类区域内不可能成为离群点的聚类点集,然后运用局部离群因子(LOF)算法对剩下的点集进行离群点挖掘。OMAGT算法较好地实现了聚类信息的动态释放,将保留的离群点挖掘信息控制在一定的内存容量范围内,提高了算法的时间效率和空间效率。理论分析与实验结果表明OMAGT算法是可行和有效的。 An algorithm of outliers mining based on grid techniques in high dimension large dataset called Outliers Mining Algorithm based on Grid Techniques (OMAGT) was proposed. Focusing on the distributing characteristics of high dimension large dataset, clustering regions were found out by using the way based on grid techniques, moreover, those clustering dataset unable to turn into outliers in clustering region were deleted. Then outliers mining was done using algorithm Local Outlier Factor (LOF) in the remaining datasets. In the algorithm OMAGT, dynamical release of clustering information was preferably carried out. Thus, information of reserved outliers mining was restricted in limited memory capacitance, so both time efficiency and space efficiency were improved. Results in both theory analyses and experiments show that this algorithm is feasible and efficient.
出处 《计算机应用》 CSCD 北大核心 2007年第10期2369-2371,2382,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(70371015)
关键词 数据挖掘 离群点 网格 聚类区域 data mining outlier,s grid clustering region
  • 相关文献

参考文献8

  • 1BARNETT V,LEWIS T.Outliers in statistical data[M].NewYork:John Wiley & Sons,1994.
  • 2JOHNSON T,KWOK I,NG R.Fast Computation of 2-dimensional Depth Contours[C]// Proceedings of the 4thInternational Conference on Knowledge Discovery and Data Mining[C].New York:AAAI Press,1998:224-228.
  • 3KNORR E M,NG R T.Algorithms for mining distance-based Outliers in large databases[C]// Proceedings of the 24 th International Conference on Very Large Data Bases.New York:Morgan Kaufmann,1998:392-403.
  • 4BREUNIG M,KRIEGEL H,NG R T,et al.LOF:Identifying Density-Based Local Outliers[C]// Proceedings of ACM SIGMOD International Conference on Management of Data.Dalles:ACM Press,2000:93-104.
  • 5JIN W,TUNG A K H,HAN J.Mining top-n local outliers in large databases[C]// Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2001:293-298.
  • 6ZHAO Y C,SONG J D.AGRID:an efficient algorithm for clustering large high-dimensional datasets[C]// Proceedings of PAKDD2003,LNCS2637.Berlin:Springer-Verlag,2003,15:271-282.
  • 7MELLI G.Dataset generator(datgen)[CP/OL].[2007-03-01].http://www.Datasetgenerator.com/.
  • 8RAKESH A,JOHANNERS G,PRABHAKAR R.Automatic subspace clustering of high dimensional data for data mining application[C]// Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data.New York:ACM Press,1994:94-105.

同被引文献30

  • 1岳士弘,王正友.二分网格聚类方法及有效性[J].计算机研究与发展,2005,42(9):1505-1510. 被引量:15
  • 2张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报(自然科学版),2005,35(6):863-866. 被引量:6
  • 3薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 4Pacific Northwest National Laboratory. Data Intensive Com- puting Project Overview[EB/OL]. http:// dicomputing. pnl. gov/. 2008.
  • 5KOUZES R T,ANDERSON G A,ELBERT S T, et al. The Changing Paradigm of Data-Intensive Computing[J]. Com- puter 42(1), pages: 26-34,2009.
  • 6KOSAR T. A new paradigm in data intensive computing: Stork and the data-aware schedulers[J]. In.. Proceedings of Challenges of Large Applications in Distributed Environ- ments Workshop, Paris, France, Jan 2011.
  • 7GRAY J, SHENOY P. Rules of thumb in data engineering [J~. In Proceedings of the IEEE International Conference on Data Engineering, San Diego, CA, December 2008.
  • 8SZALAY A S,GRAY J. Science in an Exponential World[J]. Nature, 440, pages: 23- 24,2006.
  • 9CHEN Z,TANG J,FU A W-C. Modeling and efficient mining of international knowledge of outliers [C] //IDEAS' 03: Proceedings of the 7th International Database Engineering and Applications Symposium. Washington, DC: IEEE Computer Society, 2003 ~ 44- 53.
  • 10ESTER M, KRIEGEL H ,SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[-C]/ /Proc of the 1996 2nd Int'l Conf on Knowl- edge Discovery and Data Mining. Portland: AAAI Press, 1996: 226-231.

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部