期刊文献+

基于全息熵的空间离群点挖掘算法研究 被引量:4

Spatial outlier detection based on holographic entropy
下载PDF
导出
摘要 基于距离和基于密度的离群点检测算法受到维度和数据量伸缩性的挑战,而空间数据的自相关性和异质性决定了以属性相互独立和分类属性的基于信息理论的离群点检测算法也难以适应空间离群点检测,因此提出了基于全息熵的混合属性空间离群点检测算法。算法利用区域标志属性进行区域划分,在区域内利用空间关系确定空间邻域,并用R*-树进行检索。在此基础上提出了基于全息熵的空间离群度的度量方法和空间离群点挖掘算法,有效解决了混合属性的离群度的度量和离群点的挖掘问题。由于实现区域划分有利于并行计算,从而可适应大数据量的计算。理论和实验证明,所提算法在计算效率和实验结果的可解释性方面均具有优势。 The outlier detection algorithms based on distance and density are faced with the challenges of both the dimensions and the amount of data scalability, and the autocorrelation and heterogeneity of spatial data determines that outlier detection al- gorithm which is characterized by attribute independent of each other and categorical attributes based on information theory is difficult to adapt to the spatial outlier detection. Hence, this paper proposed a spatial outlier detection algorithm based on mixed attributes of holographic entropy. The algorithm partitioned the region by regional identity property, determined the spa- tial neighborhood using spatial relationships in the region and then retrieved it by R* -tree. On this basis, it proposed spatial outlier degree based on holographic entropy and spatial outlier mining algorithm; it solved the outlier degree of the mixed at-- tributes and the problems of outliers mining effectively. It could adapt to the large volume of data calculation because partitio- ning the region was conducive to parallel computing. Theoretical and experimental results show that the algorithm proposed has advantage in terms of the computational efficiency and the interpretative aspects.
出处 《计算机应用研究》 CSCD 北大核心 2014年第2期369-372,397,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61300228) 高校博士点基金资助项目(20093227110005)
关键词 全息熵 R*-树 空间离群点 离群点检测 混合属性 holographic entropy R * -tree spatial outlier outlier detection mixed attributes
  • 相关文献

参考文献2

二级参考文献76

  • 1文俊浩,吴中福,吴红艳.空间孤立点检测[J].计算机科学,2006,33(5):186-187. 被引量:5
  • 2杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法[J].软件学报,2006,17(8):1796-1803. 被引量:22
  • 3汪加才,张金城,江效尧.一种有效的可视化孤立点发现与预测新途径[J].计算机科学,2007,34(6):200-203. 被引量:5
  • 4薛安荣,鞠时光.基于空间约束的离群点挖掘[J].计算机科学,2007,34(6):207-209. 被引量:12
  • 5赵科平,周水庚,关佶红,等.一种新的离群数据对象发现方法∥中国人工智能学会第10届全国学术年会论文集.北京:北京邮电大学出版社,2003.
  • 6Aggarwal C C, Yu P. Outlier detection for high dimensional dataft Proc. of the ACM SIGMOD International Conference on Management of Data. Santa Barbara, 2001:37-47
  • 7Angiulli F, Pizzuti C. Outlier Mining in Large High Dimensional Data Sets. IEEE Trans. Knowledge and Data Eng. , 2005, 2 (17) :203-215
  • 8Angiulli F, Basta S, Pizzuti C. Distance-based detection and prediction of outlier. IEEE Trans. Knowledge and Data Eng. , 2006, 2(18): 145-160
  • 9Aggarwal C C. Re - designing Distance Functions and Distance - based Applications for High Dimensional Data. SIGMOD Record Date, 2001, 30(1):13-18
  • 10Yu Dantong, Gholamhosein S, Zhang Aidong. FindOut: Finding Outliers in Very Large Datasets. Knowledge and Information Systems, 2002,4 (4) : 387-412

共引文献152

同被引文献46

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部