基于熵度量的空间邻域离群点查找

New approach of spatial neighborhood outliers detection based on entropy measurement

下载PDF

导出

摘要离群点的查找算法主要有两类:第一类是面向统计数据,把各种数据都看成是多维空间,没有区分空间维与非空间维,这类算法可能产生错误的判断或找到的是无意义的离群点;第二类算法面向空间数据,区分空间维与非空间维,但该类算法查找效率太低或不能查找邻域离群点。引入熵权的概念,提出了一种新的基于熵权的空间邻域离群点度量算法。算法面向空间数据,区分空间维与非空间维,利用空间索引划分空间邻域,用非空间属性计算空间偏离因子,由此度量空间邻域的离群点。理论分析表明,该算法是合理的。实验结果表明,算法具有对用户依赖性小、检测精度和计算效率高的优点。 There are usually two classes of outlier detection algorithms.One is usually applied to statistical data and takes all attributes as multi-dimensional space,while not distinguish between geo-spatial dimensionality and non-spatial dimensionality in detecting process.Meaningless or incorrect outliers can be found if we use these approaches.The other outlier detection algorithms distinguish between geo-spatial dimensionality and non-spatial dimensionality,but they have poor efficiency or can＇t detect neighborhood outliers.To overcome these shortcomings,new approach of spatial neighborhood outliers detection based on entropy measurement is proposed.ln this paper,the spatial attributes are used to determine spatial neighborhood,entropy theory is used to determine the weight of non-spatial attributes, and the non-spatial dimensions are used to compute the spatial neighborhood outlier factor,thus spatial neighborhood outliers can be captured. Theoretical analysis shows that the algorithm is reasonable.The experimental results show that the approach is practical.

作者苏锦旗薛惠锋吴慧欣

机构地区西北工业大学自动化学院华北水利水电学院信息工程学院

出处《计算机工程与应用》 CSCD 北大核心 2009年第21期41-43,50,共4页 Computer Engineering and Applications

基金陕西省自然科学基金(No.2005F45) 陕西科技攻关计划(2005K04-G13)~~

关键词熵度量空间邻域离群点检测空间邻域偏离因子空间划分 entropy measurement spatial neighborhood outliers detections spatial outlier factor space division

分类号 TP391.9 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Han J,Kamber M.Data mining:concepts and techniques[M].San Fransisco,CA,USA:Morgan Kanfmann Publishers,2000:381-389.
2HAN Jia-Wei,Micheline K.Data mining:Concepts and techniques[M].2nd ed.San Francisco:Morgan Kaufmann Publishers,2006.
3魏藜,宫学庆,钱卫宁,周傲英.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290. 被引量：44
4Shekhar S,Lu Chang-tie,Zhang Pu-sheng.A unified approach to detecting spatial outliers[J].GeoInformatica,2003,7(2):139-166.
5Lu Chang-Tien,Chen D-Chang,Kou Yu-Feng.Detecting spatial outliers with multiple attributes[C]//Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 03 ), Sacramento, 2003 : 122-128.
6Breuning M,Kriegel H P,Ng R T,et al.LOF:Idetifying density- based Local Outliers[C]//Proceedings of ACM SIGMOD Conference, Dallas, Texas, 2000: 93-104.
7Tang J,Chen Z,Fu A,et al.Enhancing effectiveness of outlier detections for low-density pattems[C]//Proceeding of Advances in Knowledge Discovery and Data Mining 6th PacificAsia Conference, Taipei, China, 2002: 535-548.
8Papadimitirou S,Kitagawa H,Gibbons P B.LOCI:Fast outlier detection using the local correlation integral[C]//Proceedings of the 19th International Conference on Data Engineering,Bangalore.Los Alamitos: IEEE Computer Society, 2003 : 315-326.
9Sanjay C,Sun Pei.SLOM:A new measure for local spatial outliers[J].Knowledge and Information Systems,2006,9(4):412-429.
10He Z,Xu X,Deng S.Discovering Cluster-based Local outliers[J]. Pattern Recognition Letters,2003,24(9-10):1642-1650.

二级参考文献27

1Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. Knowledge discovery and data mining: towards a unifying framework. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 82～88.
2Ng, R. T., Han, J. Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 20th International Conference on Very Large Data Bases. Santiago: Morgan Kaufmann, 1994. 144～155.
3Ester, M., Kriegel, H.-p., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 226～231.
4Zhang, T., Ramakrishnan, R., Linvy, M. BIRCH: an efficient eata clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 103～114.
5Wang, W., Yang, J., Muntz, R. STING: a statistical information grid approach to spatial data mining. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 186～195.
6Sheikholeslami, G., Chatterjee, S., Zhang, A. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Gupta, A., Shmueli, O., Widom, J., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York : Morgan Kaufmann, 1998. 428～439.
7Hinneburg, A., Keim, D.A. An efficient approach to clustering in large multimedia databases with noise. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 58～65.
8Agrawal, R., Gehrke, J., Gunopulos, D., et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Haas, L.M., Tiwary, A., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington, D C: ACM Press, 1998. 94～105.
9Ruts, I., Rousseeuw, P. Computing depth contours of bivariate point clouds. Journal of Computational Statistics and Data Analysis, 1996,23:153～168.
10Arning, A., Agrawal, R., Raghavan, P. A linear method for deviation detection in large databases. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 164～169.

共引文献43

1蒋盛益,徐雨明,陈溪辉.异常挖掘研究综述[J].衡阳师范学院学报,2004,25(3):63-66. 被引量：2
2ZHANG Jing 1,2 , SUN Zhi-hui 1 1.Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China,2.Department of Electricity and Information Engineering, Jiangsu University, Zhenjiang 212001, Jiangsu, China.Constructing Three-Dimension Space Graph for Outlier Detection Algorithms in Data Mining[J].Wuhan University Journal of Natural Sciences,2004,9(5):585-589. 被引量：1
3刘洪涛,童德利,陈世福.一种基于属性的异常点检测算法[J].计算机科学,2005,32(5):164-166. 被引量：4
4赵泽茂,何坤金,胡友进.基于距离的异常数据挖掘算法及其应用[J].计算机应用与软件,2005,22(9):105-107. 被引量：12
5蔡江辉,张华煜.离群数据挖掘方法研究[J].电脑开发与应用,2005,18(12):46-47. 被引量：1
6苏华.营销培训问题攻略[J].人才资源开发,2005(12):74-74.
7张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报（自然科学版）,2005,35(6):863-866. 被引量：6
8金义富,朱庆生,邹咸林.高维数据集离群子空间特性研究[J].计算机工程与应用,2006,42(9):147-149. 被引量：2
9汤俊,熊前兴.用于可疑金融交易监控的对比离群点检测模型[J].武汉理工大学学报,2006,28(4):112-115. 被引量：7
10黄洪宇,林甲祥,陈崇成,樊明辉.离群数据挖掘综述[J].计算机应用研究,2006,23(8):8-13. 被引量：42

1黄添强,秦小麟,王钦敏.空间数据库中离群点的度量与查找新方法[J].中国图象图形学报,2006,11(7):982-989. 被引量：7
2禹建东,孔月萍.基于曲面拟合的图像分割算法[J].现代电子技术,2008,31(22):106-107. 被引量：2
3孙岩,李爱军,王长青.三种判别模型在化学结果分析中的应用[J].计算机与数字工程,2014,42(8):1360-1362.
4李光强,邓敏,朱建军,程涛,刘启亮.一种顾及邻近域内实体间距离的空间异常检测新方法(英文)[J].遥感学报,2009,13(2):197-202. 被引量：10

计算机工程与应用

2009年第21期

浏览历史

内容加载中请稍等...

基于熵度量的空间邻域离群点查找

参考文献11

二级参考文献27

共引文献43

相关作者

相关机构

相关主题

浏览历史