期刊文献+

基于粗约简和网格的离群点检测 被引量:10

Outliers detecting based on rough reduction and grid
下载PDF
导出
摘要 为解决现有高维海量数据离群点挖掘在时间与空间效率上的不足,提出了一种基于粗约简和网格的离群点检测算法RRGOD。算法在基于密度的离群点检测算法LOF的基础上,结合粗糙集理论特点,引入属性权值概念,淘汰属性权值低于重要度阈值的属性降低维度,从而减少了进行聚类的计算量。在网格聚类阶段,对传统的网格划分方法进行改进,引入属性维半径向量概念,提出了一种可变网格划分方法,根据数据集特点自适应地划分网格空间。在真实数据集和仿真数据集上进行了实验。结果表明,该算法在进行离群点检测时能在保持足够精确度的同时,检测效率有显著的改善。 In order to solve the existing insufficiency of mining outliers in time and space efficiency in high dimensional and massive data, this paper proposes a grid based on rough reduction and outlier detection algorithms RRGOD. Based on the density-based outlier detection algorithm LOF, it combines the characteristics of rough set theory, introduces the concept of the value of property rights, and reduces dimensions by eliminating the values of property right below the threshold,thereby reducing the amount of calculation clustering. In the grid clustering stage, the traditional meshing method is improved,introduces the concept of property dimensional radius vector, and a variable meshing method is presented. Meshing space is divided adaptively according to the characteristics of the data set. Experiment is done on real data sets and simulation data sets. The results show that during outlier detection the algorithm can maintain sufficient accuracy while a significant detecting efficiency is improved.
作者 王敬华 金鹏
出处 《计算机工程与应用》 CSCD 北大核心 2015年第3期133-137,180,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.61170017 No.61370108)
关键词 数据挖掘 离群点检测 粗糙集 网格 属性权值 data mining outlier detecting rough set grid attribute weights
  • 相关文献

参考文献15

  • 1Hawkins D.Identification of outliers[M].[S.l.]:London Chapman and Hall,1980.
  • 2Johnson T,Kwok I,Ng R.Fast computation of 2-dimensional depth contours[C]//Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining,1998:224-228.
  • 3Barnett V,Lewis T.Outliers in statistical data[M].3rd ed.New York:John Wiley and Sons,1994.
  • 4Breuig M M,Kriegel H,Ng R T,et al.LOF:identifying density-based local outliers[C]//Proceedings of 2000 ACM SIGMOD International Conference on Management of Data.New York:ACM Press,2000:93-104.
  • 5Knorr E M,Ng R T.Algorithms for mining distance-based outliers in large datasets[C]//Proceedings of the 24th International Conference on Very Large Databases.New York:ACM Press,1998:392-403.
  • 6Pawlak Z.Rough set[J].International Journal of Computer and Information Sciences,1982,11:341-356.
  • 7Sheikholeslami G,Chatterjee S,Zhang A.Wave Cluster:a multi-resolution clustering approach for very large spatial databases[C]//Proceedings of the 24th VLDB Conference,New York,USA,1998:428-439.
  • 8Ng R T,Han J.Efficient and effective clustering methods for spatial data mining[C]//Proc of the 20th VLDB Conference,Santiago,1994:144-155.
  • 9Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proc of the 2nd International Conference on Knowledge Discovery and Data Mining,Portland,1996:226-231.
  • 10Hinneburg A,Keim D A.Optimal grid-clustering:towards breaking the curse of dimensionality in high-dimensional clustering[C]//Proceedings of the 25th VLDB Conference,1999:506-517.

二级参考文献7

  • 1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
  • 3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
  • 4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
  • 5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
  • 6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
  • 7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.

共引文献27

同被引文献116

引证文献10

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部