一种基于关键域子空间的离群数据聚类算法被引量：8

An Algorithm for Clustering of Outliers Based on Key Attribute Subspace

下载PDF

导出

摘要离群数据发现与分析是数据挖掘的重要组成部分,现有离群数据挖掘算法主要针对如何检测离群对象,缺乏对挖掘出的离群数据集进行解释与分析的有效方法.通过对离群数据来源及特性进行分析并结合粗糙集理论,定义了离群划分相似度的概念,提出了一种基于关键属性域子空间的离群数据聚类算法COKAS,该算法不仅揭示了离群数据子空间特性,进一步获取了扩展知识,而且有助于对整体数据集的理解.对两个多维数据集的实验结果表明,该算法具有良好的适应性及有效性. It is an important part of data mining to discover and analyze outlying observations. Outliers may contain crucial information, and so detecting them is much more significant than detecting general patterns in some applications which include, for instance, credit card fraud in finance, calling fraud in telecommunication, intrusion in network, disease diagnosis, etc. Existing outlier mining algorithms focus on detecting and identifying outliers, but studies of outliers include both mining outliers and analyzing why they are exceptional. The research on explaining and analyzing outliers slightly lags behind outlier mining technology now. It is inevitable that analyzing outliers to the full needs a great deal of knowledge from object task fields. However, some further discoveries of outliers may be obtained from studies of distributing characteristics of dataset in attribute space. By analyzing the origin and feature of outliers and using the theory of rough set, a concept of outlying partition similarity is defined and then an algorithm for clustering outliers based on key attribute subspace （COKAS） is proposed. The approach can provide the extended knowledge of identified outliers and improve the understanding of the whole data set. Experimental results of real multi-dimension data set show that this algorithm is scalable and efficient.

作者金义富朱庆生邢永康

机构地区重庆大学计算机科学与工程学院

出处《计算机研究与发展》 EI CSCD 北大核心 2007年第4期651-659,共9页 Journal of Computer Research and Development

基金国家自然科学基金项目(60403009) 重庆市自然科学基金项目(2005BB2224)

关键词离群集离群划分相似度关键域子空间聚类 outlier outlying partition similarity key attribute subspace clustering

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量：28
2W Jin,A K H Tung,J Han.Mining top-n local outliers in large databases[C].The 7th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining,San Francisco,California,2001
3C Aggarwal,P Yu.Outlier detection for high dimensional data[C].In:Proc of the ACM SIGMOD Int'l Conf on Management of Data.New York:ACM Press,2001.37-47
4S Hawkins,H He,G Williams,et al.Outlier detection using replicator neural networks[C].In:Proc of the 4th Int'l Conf on DaWaK Data Warehousing and Knowledge Discovery.Berlin:Springer-Verlag.2002.170-180
5X Liu,G Cheng,J Wu.Analyzing outlier cautiously[J].IEEE Trans on Knowledge and Data Engineering,2002,14(2):432-437
6S Ramaswamy,R Rastogi,K Shim.Efficient algorithms for mining outliers from large data sets[C].In:Proc of the ACM SIGMOD Int'l Conf on Management of Data.New York:ACM Press,2000.427-438
7S Papadimitriou,H Kitagawa,P B Gibbons.LOCI:Fast outlier detection using the local correlation integral[C].In:Proc of the 19th Int'l Conf on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press,2003.315-326
8E M Knorr,R T Ng.Finding intensional knowledge of distance based outliers[C].In:Proc of the 25th Int'l Conf on Very Large Data Bases.New York:Morgan Kaufmann,1999.211-222
9Z Chen,J Tang,A Fu.Modeling and efficient mining of intentional knowledge of outliers[C].In:Proc of the 7th Int'l Database Engineering and Applications Symposium.Los Alamitos,CA:IEEE Computer Society Press,2003.1-10
10C Chan.A rough set approach to attribute generalization in data mining[J].Information Sciences,1998,107(10):169-176

二级参考文献7

1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.

共引文献27

1李存华,孙志挥,陈耿,胡云.核密度估计及其在聚类算法构造中的应用[J].计算机研究与发展,2004,41(10):1712-1719. 被引量：64
2ZHANG Jing 1,2 , SUN Zhi-hui 1 1.Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China,2.Department of Electricity and Information Engineering, Jiangsu University, Zhenjiang 212001, Jiangsu, China.Constructing Three-Dimension Space Graph for Outlier Detection Algorithms in Data Mining[J].Wuhan University Journal of Natural Sciences,2004,9(5):585-589. 被引量：1
3肖冰,邓飞其.一种对电子商店中孤立点进行跟踪的算法[J].河南科技大学学报（自然科学版）,2005,26(4):41-43.
4张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报（自然科学版）,2005,35(6):863-866. 被引量：6
5杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法[J].软件学报,2006,17(8):1796-1803. 被引量：22
6周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量：21
7孙云,李舟军,陈火旺.孤立点检测算法及其在数据流挖掘中的可用性[J].计算机科学,2007,34(10):200-203. 被引量：15
8李存华.l_∞度量意义下的离群点检测[J].淮海工学院学报（自然科学版）,2008,17(2):27-30.
9倪巍伟,陈耿,陆介平,吴英杰,孙志挥.基于局部信息熵的加权子空间离群点检测算法[J].计算机研究与发展,2008,45(7):1189-1194. 被引量：27
10李存华,纪兆辉,胡云.分箱核密度估计的误差及其修正[J].数据采集与处理,2009,24(2):212-217. 被引量：1

同被引文献53

1孟媛媛,刘希玉.一种新的基于二叉树的SVM多类分类方法[J].计算机应用,2005,25(11):2653-2654. 被引量：42
2金义富,朱庆生,邹咸林.高维数据集离群子空间特性研究[J].计算机工程与应用,2006,42(9):147-149. 被引量：2
3杨宜东,孙志挥,朱玉全,杨明,张柏礼.基于动态网格的数据流离群点快速检测算法[J].软件学报,2006,17(8):1796-1803. 被引量：22
4金义富,朱庆生,邹咸林.基于遗传算法的α-离群约简搜索算法[J].计算机科学,2006,33(10):198-201. 被引量：2
5邓健爽,郑启伦,彭宏,邓维维.基于连通图动态分裂的聚类算法[J].华南理工大学学报（自然科学版）,2007,35(1):118-122. 被引量：5
6周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量：21
7周贤伟,王培,覃伯平,申吉红.一种无线传感器网络异常检测技术研究[J].传感技术学报,2007,20(8):1870-1874. 被引量：13
8薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量：96
9KNORR E M, NG R T, TUCAKOV V. Distance-based outliers: algorithms and applications[J]: The VLDB Journal,2000,8(3-4) : 237-253.
10KNORR E M, NG R T. Finding intentional knowledge of distancedbased outliers [ C ]//Proc of the 25th VLDB Conference. Edinburgh: [s. n. ] , 1999.

引证文献8

1李旭辉,郑丽英.基于特征赋权的离群数据再聚类算法[J].兰州交通大学学报,2008,27(1):135-137.
2金义富,朱庆生,邹咸林.基于邻接图的离群数据聚类算法[J].计算机工程,2008,34(11):72-73.
3金义富,朱庆生.一种离群数据集延伸知识发现框架[J].华南理工大学学报（自然科学版）,2008,36(9):31-36. 被引量：2
4张力生,贺改利,雷大江.基于幂图的离群子空间搜索算法[J].计算机应用研究,2011,28(8):2859-2861. 被引量：1
5赵向兵,白伟.离群数据检测研究[J].山西大同大学学报（自然科学版）,2012,28(2):10-13.
6高丙朋,南新元.基于离群点挖掘的废润滑油资源化再生处理工艺参数优化算法研究[J].化工自动化及仪表,2013,40(8):1036-1039. 被引量：1
7王岩,李洪亮.无线传感网络局部离群节点定位模型仿真[J].计算机仿真,2013,30(12):253-256. 被引量：5
8朱庆生,程柯.一种基于累积适应度遗传算法的SVM多分类决策树[J].计算机应用研究,2016,33(1):64-67. 被引量：12

二级引证文献21

1邓玉洁,朱庆生.基于聚类的离群点分析方法[J].计算机应用研究,2012,29(3):865-868. 被引量：5
2于颖,张帆,李圣平,吕振波,王璐璐,王吉林.热沉降—蒸馏—吸附精制法处理废润滑油的工艺条件优化[J].石化技术与应用,2015,33(2):133-136. 被引量：2
3卢惠林.复杂生物传感网络的节点优化定位模型仿真[J].计算机仿真,2015,32(9):299-302.
4金义富,吴涛,张子石,王伟东.大数据环境下学业预警系统设计与分析[J].中国电化教育,2016(2):69-73. 被引量：60
5张娓娓,郭军,陈绥阳.MWSN中基于右手法则的分布式节点路径选择算法[J].计算机工程,2016,42(6):75-80. 被引量：2
6孟滔,周新志,雷印杰.基于自适应遗传算法的SVM参数优化[J].计算机测量与控制,2016,24(9):215-217. 被引量：12
7仇丹丹.移动网络中异常节点快速定位优化方法的仿真[J].计算机仿真,2016,33(9):321-324. 被引量：5
8陈旭,徐震,夏静山.狭长区域中无线传感网络节点定位技术[J].计算机系统应用,2016,25(12):276-279.
9侯珏,李庆祥.基于支持向量机的码头节能减排预警[J].中国航海,2016,39(4):108-112. 被引量：1
10张颖芳,凌卫新.基于动态调整的GA-SVM多分类二叉树的方法[J].科学技术与工程,2017,17(7):177-182. 被引量：7

1李旭辉,郑丽英.基于特征赋权的离群数据再聚类算法[J].兰州交通大学学报,2008,27(1):135-137.
2雷小建.实用工业产品设计[J].新技术新工艺,2012(5):41-43.
3金义富,朱庆生,邹咸林.高维数据集离群子空间特性研究[J].计算机工程与应用,2006,42(9):147-149. 被引量：2
4金义富,杨俊杰.离群数据关键域子空间快速搜索技术[J].计算机工程与应用,2011,47(17):145-147. 被引量：1
5金义富,朱庆生,邹咸林.基于邻接图的离群数据聚类算法[J].计算机工程,2008,34(11):72-73.
6张力生,贺改利,雷大江.基于幂图的离群子空间搜索算法[J].计算机应用研究,2011,28(8):2859-2861. 被引量：1
7汪加才,韩冰青,陈大峰.基于SOM的离群数据挖掘集成框架研究[J].计算机应用研究,2007,24(10):44-47.
8黄玉婷.掌上电子图书馆的设计[J].信息通信,2016,29(4):122-123.
9金义富,朱庆生.一种离群数据集延伸知识发现框架[J].华南理工大学学报（自然科学版）,2008,36(9):31-36. 被引量：2
10王天宏,张培晶.基于环境的Java多线程行为比较与分析[J].福建电脑,2008,24(2):56-57.

计算机研究与发展

2007年第4期

浏览历史

内容加载中请稍等...

一种基于关键域子空间的离群数据聚类算法被引量：8

参考文献11

二级参考文献7

共引文献27

同被引文献53

引证文献8

二级引证文献21

相关作者

相关机构

相关主题

浏览历史

一种基于关键域子空间的离群数据聚类算法 被引量：8

参考文献11

二级参考文献7

共引文献27

同被引文献53

引证文献8

二级引证文献21

相关作者

相关机构

相关主题

浏览历史

一种基于关键域子空间的离群数据聚类算法被引量：8