期刊文献+

基于熵度量的空间邻域离群点查找

New approach of spatial neighborhood outliers detection based on entropy measurement
下载PDF
导出
摘要 离群点的查找算法主要有两类:第一类是面向统计数据,把各种数据都看成是多维空间,没有区分空间维与非空间维,这类算法可能产生错误的判断或找到的是无意义的离群点;第二类算法面向空间数据,区分空间维与非空间维,但该类算法查找效率太低或不能查找邻域离群点。引入熵权的概念,提出了一种新的基于熵权的空间邻域离群点度量算法。算法面向空间数据,区分空间维与非空间维,利用空间索引划分空间邻域,用非空间属性计算空间偏离因子,由此度量空间邻域的离群点。理论分析表明,该算法是合理的。实验结果表明,算法具有对用户依赖性小、检测精度和计算效率高的优点。 There are usually two classes of outlier detection algorithms.One is usually applied to statistical data and takes all attributes as multi-dimensional space,while not distinguish between geo-spatial dimensionality and non-spatial dimensionality in detecting process.Meaningless or incorrect outliers can be found if we use these approaches.The other outlier detection algorithms distinguish between geo-spatial dimensionality and non-spatial dimensionality,but they have poor efficiency or can't detect neighborhood outliers.To overcome these shortcomings,new approach of spatial neighborhood outliers detection based on entropy measurement is proposed.ln this paper,the spatial attributes are used to determine spatial neighborhood,entropy theory is used to determine the weight of non-spatial attributes, and the non-spatial dimensions are used to compute the spatial neighborhood outlier factor,thus spatial neighborhood outliers can be captured. Theoretical analysis shows that the algorithm is reasonable.The experimental results show that the approach is practical.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第21期41-43,50,共4页 Computer Engineering and Applications
基金 陕西省自然科学基金(No.2005F45) 陕西科技攻关计划(2005K04-G13)~~
关键词 熵度量 空间邻域离群点检测 空间邻域偏离因子 空间划分 entropy measurement spatial neighborhood outliers detections spatial outlier factor space division
  • 相关文献

参考文献11

  • 1Han J,Kamber M.Data mining:concepts and techniques[M].San Fransisco,CA,USA:Morgan Kanfmann Publishers,2000:381-389.
  • 2HAN Jia-Wei,Micheline K.Data mining:Concepts and techniques[M].2nd ed.San Francisco:Morgan Kaufmann Publishers,2006.
  • 3魏藜,宫学庆,钱卫宁,周傲英.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290. 被引量:44
  • 4Shekhar S,Lu Chang-tie,Zhang Pu-sheng.A unified approach to detecting spatial outliers[J].GeoInformatica,2003,7(2):139-166.
  • 5Lu Chang-Tien,Chen D-Chang,Kou Yu-Feng.Detecting spatial outliers with multiple attributes[C]//Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 03 ), Sacramento, 2003 : 122-128.
  • 6Breuning M,Kriegel H P,Ng R T,et al.LOF:Idetifying density- based Local Outliers[C]//Proceedings of ACM SIGMOD Conference, Dallas, Texas, 2000: 93-104.
  • 7Tang J,Chen Z,Fu A,et al.Enhancing effectiveness of outlier detections for low-density pattems[C]//Proceeding of Advances in Knowledge Discovery and Data Mining 6th PacificAsia Conference, Taipei, China, 2002: 535-548.
  • 8Papadimitirou S,Kitagawa H,Gibbons P B.LOCI:Fast outlier detection using the local correlation integral[C]//Proceedings of the 19th International Conference on Data Engineering,Bangalore.Los Alamitos: IEEE Computer Society, 2003 : 315-326.
  • 9Sanjay C,Sun Pei.SLOM:A new measure for local spatial outliers[J].Knowledge and Information Systems,2006,9(4):412-429.
  • 10He Z,Xu X,Deng S.Discovering Cluster-based Local outliers[J]. Pattern Recognition Letters,2003,24(9-10):1642-1650.

二级参考文献27

  • 1Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. Knowledge discovery and data mining: towards a unifying framework. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 82~88.
  • 2Ng, R. T., Han, J. Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 20th International Conference on Very Large Data Bases. Santiago: Morgan Kaufmann, 1994. 144~155.
  • 3Ester, M., Kriegel, H.-p., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 226~231.
  • 4Zhang, T., Ramakrishnan, R., Linvy, M. BIRCH: an efficient eata clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 103~114.
  • 5Wang, W., Yang, J., Muntz, R. STING: a statistical information grid approach to spatial data mining. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 186~195.
  • 6Sheikholeslami, G., Chatterjee, S., Zhang, A. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Gupta, A., Shmueli, O., Widom, J., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York : Morgan Kaufmann, 1998. 428~439.
  • 7Hinneburg, A., Keim, D.A. An efficient approach to clustering in large multimedia databases with noise. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 58~65.
  • 8Agrawal, R., Gehrke, J., Gunopulos, D., et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Haas, L.M., Tiwary, A., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington, D C: ACM Press, 1998. 94~105.
  • 9Ruts, I., Rousseeuw, P. Computing depth contours of bivariate point clouds. Journal of Computational Statistics and Data Analysis, 1996,23:153~168.
  • 10Arning, A., Agrawal, R., Raghavan, P. A linear method for deviation detection in large databases. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 164~169.

共引文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部