期刊文献+

基于MST聚类的空间数据离群挖掘算法 被引量:1

Algorithm of spatial outlier mining based on MST clustering
下载PDF
导出
摘要 空间离群是指空间邻域中属性特征值明显不同于其他对象的空间对象,空间数据离群挖掘能为人们提供很多有趣的信息,但空间数据具有复杂的拓扑关系、方位关系和度量关系等空间特征,传统的面向事务型数据库的离群挖掘算法并不适用于空间数据库。本文提出了基于MST(Minimum Spanning Tree,最小生成树)聚类的空间数据离群挖掘算法(SOM);有机结合了最小生成树理论与密度的方法,既体现了空间离群的局部特性,又体现了空间离群的孤立程度。该算法通过MST维护空间数据的基本空间结构特征,通过打断MST中最不一致的边形成MST聚类,不仅具有密度的聚类方法能够聚集非球状簇和分布不均的数据集的特点,而且聚类结果不依赖于用户参数的选择,因此,离群挖掘结果更合理。最后,通过实例数据,验证了该算法的有效性,它适用于大规模空间数据集的离群挖掘。 A spatial outlier is a spatial object whose non-spatial attribute values are significantly deviated from the other data's in the dataset. How to detect spatial outliers from spatial dataset and to explain the reason causes the anomaly in practical application have become more and more interesting to many researchers. Spatial outliers mining can bring us a lot of interesting information, but for the complicated characteristic of spatial data, such as topological relation, orientation relation, measurement relation, and so on, traditional algorithms for outlier mining in business database seem to deficient in spatial dataset, the main problem lies in the difficulty to maintain spatial structure characteristics for most existing algorithms during the process of outlier mining. Thanks to the similarities between clustering and outlier mining, clustering based outlier mining is an important way to detect anomalies from dataset. However, due to the diversity of clustering algorithms, it is difficult to choose a proper one for outlier mining, and the main purpose of clustering is to find out the principal features of the dataset, outliers are the by-products of clustering. Based on minimum spanning tree clustering, a new algorithm for spatial outlier mining called SOM is proposed. The algorithm keeps basic spatial structure characteristics of spatial objects through the use of geometric structure : Delaunay triangulated irregular network and minimum spanning tree ( MST), and it gains MST clustering by cutting off several most inconsistent edges of MST, so that it not only owns the function that it can acquire clusters from non-spherical and unbalanced datasets as the density-based cluster algorithms does, but also has the advantage that it doesn't depend on user's pre-set parameters, so the clustering result is usually more reasonable. Finally, the validity of SOM algorithm is validated by real application of geochemical soil elements dataset inspected to coastal areas of Fujian province, through analysis it is found that the algorithm is also applicable for spatial outlier mining in massive spatial dataset.
出处 《地球信息科学》 CSCD 2008年第5期586-592,共7页 Geo-information Science
基金 国家自然科学基金项目"面向地学应用的时空多尺度数据挖掘研究"(60602052) 福建省自然科学基金项目"涉及障碍物和便利体约束条件的空间聚类研究"(2006J0131) 福建省科技计划重点项目"网格环境下知识服务的语义建模技术与应用"(2005H086) 福建省高等学校新世纪优秀人才支撑计划
关键词 SOM算法 聚类的离群 空间离群 MST聚类 SOM algorithm cluster-based outliers spatial outliers MST clustering
  • 相关文献

参考文献12

  • 1邸凯昌,李德仁,李德毅.空间数据发掘和知识发现的框架[J].武汉测绘科技大学学报,1997,22(4):328-332. 被引量:60
  • 2S Shekhar, C -T Lu, P Zhang, Detecting Graph-Based Spatial Outlier: Algorithms and Applications (A Summary of Results). In Proc. of the Seventh ACM-SIGKDD Int' 1 Conference on Knowledge Discovery and Data Mining, Aug, 2001.
  • 3黄洪宇,林甲祥,陈崇成,樊明辉.离群数据挖掘综述[J].计算机应用研究,2006,23(8):8-13. 被引量:42
  • 4Hodge V, Austin J. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 2004, 22 (2) : 85 - 126.
  • 5S Shekhar, C -T. Lu, P Zhang. A unified approach to detecting spatial outliers. Geo-Informatica, 2003, 7 (2) : 139 - 166.
  • 6S Shekhar, C -T Lu, P Zhang. Detecting graph-based spatial Outlier. Intelligent Data Analysis: An International Journal, 2002, 6 (5) : 451 - 468.
  • 7He Z, Xu X, Deng S. Discovering cluster-based local out- liers. Pattern Recognition Letters, 2003, 24 ( 9 - 10 ) : 1642 - 1650.
  • 8ester M, Kriegel H P, Sander J, et al. A Density based algorithm for discovering clusters in large spatial databases. Proc of KDD'96. Portland OR, 1996, 226-231.
  • 9Ng R T, Han J. Efficient clustering methods for spatial data mining. In Proc. The 20th International Conference on Very large Data Bases. Santiago, 1994, 144 - 155.
  • 10M M Breunig, H Kriegel, R T Ng, et al. LOF: Identifying density-based local outliers. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Texas: ACM Press, 2000, 93 - 104.

二级参考文献25

  • 1Zheng Binxiang,Du Xiuhua & Xi Yugeng Institute of Automation, Shanghai Jiaotong University,Shanghai 200030,P.R.China.Outliers Mining in Time Series Data Sets[J].Journal of Systems Engineering and Electronics,2002,13(1):93-97. 被引量:3
  • 2范大昭,雷蓉,张永生.从地理数据库中探测奇异值[J].测绘科学,2004,29(5):12-15. 被引量:2
  • 3陆声链,林士敏.基于距离的孤立点检测及其应用[J].计算机与数字工程,2004,32(5):94-97. 被引量:23
  • 4Jain AK,Murty MN,Flynn PJ.Data clustefing:a review[J],ACM Computing Surveys, 1999 ; 31 (3) : 264-323.
  • 5Ester M,Kriegel HP,Sander Jet al.A density-based algorithm for discovering clusters in large spatial databases with noise[C].In:Proceedings of the 2^nd Int Conf on Knowledge Discovery and Data Mining.Portland, Dregon,1996:226-231.
  • 6Han J,Kamber M.Data Mining:Concepts and Techniques[M],Simon Fraser University,2000.
  • 7Zahn CT.Graph theoretical methods for detecting and describing gestalt clusters[J].IEEE Transactions on Computers, 1971 ;20( 1 ).
  • 8Johnson DB ,Metaxas P.A parallel algorithm for computing minimum spanning trees[C].In:Proceedings of the 4th annual ACM symposium on Parallel algorithms and architectures, San Diego, California, United States, 1992 : 363-372.
  • 9Algorithms for clustering data[M],New Jersey:Prentice-Hall Inc, 1996.
  • 10Karypis G,Han EH,Kumar V,CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic Modeling[J],IEEE Computer, 1999;32 (8) :68-75.

共引文献104

同被引文献12

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部