期刊文献+

一种基于关键域子空间的离群数据聚类算法 被引量:8

An Algorithm for Clustering of Outliers Based on Key Attribute Subspace
下载PDF
导出
摘要 离群数据发现与分析是数据挖掘的重要组成部分,现有离群数据挖掘算法主要针对如何检测离群对象,缺乏对挖掘出的离群数据集进行解释与分析的有效方法.通过对离群数据来源及特性进行分析并结合粗糙集理论,定义了离群划分相似度的概念,提出了一种基于关键属性域子空间的离群数据聚类算法COKAS,该算法不仅揭示了离群数据子空间特性,进一步获取了扩展知识,而且有助于对整体数据集的理解.对两个多维数据集的实验结果表明,该算法具有良好的适应性及有效性. It is an important part of data mining to discover and analyze outlying observations. Outliers may contain crucial information, and so detecting them is much more significant than detecting general patterns in some applications which include, for instance, credit card fraud in finance, calling fraud in telecommunication, intrusion in network, disease diagnosis, etc. Existing outlier mining algorithms focus on detecting and identifying outliers, but studies of outliers include both mining outliers and analyzing why they are exceptional. The research on explaining and analyzing outliers slightly lags behind outlier mining technology now. It is inevitable that analyzing outliers to the full needs a great deal of knowledge from object task fields. However, some further discoveries of outliers may be obtained from studies of distributing characteristics of dataset in attribute space. By analyzing the origin and feature of outliers and using the theory of rough set, a concept of outlying partition similarity is defined and then an algorithm for clustering outliers based on key attribute subspace (COKAS) is proposed. The approach can provide the extended knowledge of identified outliers and improve the understanding of the whole data set. Experimental results of real multi-dimension data set show that this algorithm is scalable and efficient.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第4期651-659,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60403009) 重庆市自然科学基金项目(2005BB2224)
关键词 离群集 离群划分相似度 关键域子空间 聚类 outlier outlying partition similarity key attribute subspace clustering
  • 相关文献

参考文献11

  • 1李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量:28
  • 2W Jin,A K H Tung,J Han.Mining top-n local outliers in large databases[C].The 7th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining,San Francisco,California,2001
  • 3C Aggarwal,P Yu.Outlier detection for high dimensional data[C].In:Proc of the ACM SIGMOD Int'l Conf on Management of Data.New York:ACM Press,2001.37-47
  • 4S Hawkins,H He,G Williams,et al.Outlier detection using replicator neural networks[C].In:Proc of the 4th Int'l Conf on DaWaK Data Warehousing and Knowledge Discovery.Berlin:Springer-Verlag.2002.170-180
  • 5X Liu,G Cheng,J Wu.Analyzing outlier cautiously[J].IEEE Trans on Knowledge and Data Engineering,2002,14(2):432-437
  • 6S Ramaswamy,R Rastogi,K Shim.Efficient algorithms for mining outliers from large data sets[C].In:Proc of the ACM SIGMOD Int'l Conf on Management of Data.New York:ACM Press,2000.427-438
  • 7S Papadimitriou,H Kitagawa,P B Gibbons.LOCI:Fast outlier detection using the local correlation integral[C].In:Proc of the 19th Int'l Conf on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press,2003.315-326
  • 8E M Knorr,R T Ng.Finding intensional knowledge of distance based outliers[C].In:Proc of the 25th Int'l Conf on Very Large Data Bases.New York:Morgan Kaufmann,1999.211-222
  • 9Z Chen,J Tang,A Fu.Modeling and efficient mining of intentional knowledge of outliers[C].In:Proc of the 7th Int'l Database Engineering and Applications Symposium.Los Alamitos,CA:IEEE Computer Society Press,2003.1-10
  • 10C Chan.A rough set approach to attribute generalization in data mining[J].Information Sciences,1998,107(10):169-176

二级参考文献7

  • 1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
  • 3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
  • 4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
  • 5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
  • 6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
  • 7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.

共引文献27

同被引文献53

引证文献8

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部