基于离群点识别的聚类结果属性特征簇发现被引量：1

Discovering attribute features of a cluster for any clustering result based on an outlier detection technique

下载PDF

导出

摘要对聚类结果的理解有助于评价聚类效果,可以据此调整聚类过程,更高效地使用聚类结果.但是,聚类结果的理解仍然是一个尚未解决的问题.提出了基于离群点识别技术分析任意聚类算法的聚类结果,发现了聚类结果属性特征簇的方法;提出一种基于不相似性比值的离群点识别算法.通过对全部数据簇的属性描述进行离群点分析,发现各数据簇的特征属性,实现对聚类结果的理解.所提方法适用于任意聚类算法结果的分析.对UCI的iris、ZOO和Housing数据集的采用X-means、Frozen和DBScan算法的聚类结果进行聚类结果分析,实验表明所提方法较成功地发现了不同聚类算法的属性特征簇,有助于对聚类结果的深入理解. Understanding of clustering results is still an open problem, yet this understanding is critical for the evaluation and usage of them. This paper proposes a means to discover the attribute features of any cluster that was derived using the outlier detection method. Based on our novel outlier detection method, the paper analyzes the attribute features of the obtained clusters, and then returns the results. The proposed method can be adapted for any clustering algorithm. An experiment was conducted on the clustering results of the algorithms X-means, Frozen and DBScan for the UCI datasets iris, ZOO and Housing. The proposed algorithm was shown to achieve good performance in understanding the clustering results of different clustering methods.

作者陈英顾国昌吕天阳

机构地区哈尔滨工程大学计算机科学与技术学院

出处《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2009年第3期312-317,共6页 Journal of Harbin Engineering University

基金高等学校博士学科点基金资助项目(20070217043) 哈尔滨工程大学基础研究基金资助项目(HEUFT05007)

关键词聚类属性特征簇数据簇分析离群点识别 clustering attribute features cluster cluster analysis outlier detection

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1HANJ KAMBERM 范明孟小峰译.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
2PELLEG D, MOORE A. X-means: extending K-means with efficient estimation of the number of clusters [ C ]// Proc 2000 Int Conf on Data Mining. San Francisco, 2000: 727- 734.
3SPRENGER T C, BRUNELLA R. GROSS M H. H-Blob:A hierarchical visual clustering method using implicit surfaces [ C ]// Proceedings of Visualization. Salt Lake City, USA, 2000: 61-68.
4NAKAMURA T. Feature extraction of clusters based on flexdice [ C ]// Proceedings of the 21st International Conference on Data Engineering Workshops. Tokyo, Japan. 2005 : 1126-1130.
5KNORR E, NG R. Finding intensional knowledge of distance-based outliers [ C]//Proc of the VLDB Conf. Edinburgh: Morgan Kaufmann Publishers. San Fransisco, USA, 1999:211-222.
6RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets [ C ]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000 : 427-438.
7BREUNIGM M, KRIEGE1H P, NGRT, et al. LOF: identifying density-based local outliers [ C ]// Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas : ACM Press, 2000:93-104.
8HETTICH S, BLAKE C L, MERZ C J. UCI Repository of machine learning databases [ EB/OL ]. (2007-12-09). ht tp ://www. ics. uci. edu/-mlearn/MLRepository, html.
9FALOUTSOS C, LINK. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets [ C]//Proeeedings of the 1995 ACM SIGMOD International Conference on Management of Data. San Jose: ACM Press, 1995 : 163-174.
10FRED A L N, LEIT A, O J M N. A new cluster Isolation criterion based on dissimilarity increments [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25 ( 8 ) :944-958.

共引文献44

1吕锡香,杨波,裴昌幸,苏晓龙.基于数据挖掘的入侵检测系统检测引擎的设计[J].西安电子科技大学学报,2004,31(4):574-580. 被引量：10
2赵守伟.数据挖掘在网络异常检测中的应用[J].河北大学学报（自然科学版）,2004,24(4):444-447. 被引量：2
3刘芳,孙杨军.基于多克隆选择的多维关联规则挖掘算法[J].复旦学报（自然科学版）,2004,43(5):742-745. 被引量：9
4厍向阳,彭文祥,薛惠锋.满足二维空间邻接条件的遗传聚类算法研究[J].计算机应用,2005,25(10):2395-2397.
5樊建聪,张问银,梁永全.基于贝叶斯方法的决策树分类算法[J].计算机应用,2005,25(12):2882-2884. 被引量：20
6王雪姣,叶枫.基于关联规则算法的工业生产班组运行质量分析[J].计算机应用,2005,25(B12):211-212. 被引量：2
7李新安,石冰.基于决策树方法的特定主题Web搜索策略[J].计算机应用,2006,26(1):223-226. 被引量：3
8王晓乔,张桂新,喻兴标.Web使用挖掘预处理技术研究[J].湘潭师范学院学报（自然科学版）,2006,28(2):18-20.
9任江涛,黄焕宇,孙婧昊,印鉴.基于相关性分析及遗传算法的高维数据特征选择[J].计算机应用,2006,26(6):1403-1405. 被引量：16
10刘博,彭宏,郑启伦.一种新的数据预处理算法——NLCA[J].计算机应用,2006,26(6):1406-1408. 被引量：3

同被引文献6

1董晓莉,顾成奎,王正欧.基于形态的时间序列相似性度量研究[J].电子与信息学报,2007,29(5):1228-1231. 被引量：33
2V.Barnett, T.Lewis.Outliers in statistical data. John Wiley and sons,1994.
3Edward Hung,David W.Cheung." Parallel Mining of Outliers in Large Database",in Distributed and Parallel Database(DAPD), Kluwer Academic Publishers,Volume 12,1ssue 1,pages 5-26,July 2002.
4郑健,皮德常.基于共享最邻近的聚类和孤立点检测算法.第一届中国高校通信类院系学术研讨会,2007.
5成万里,熊豪,曲翠兰.前兆仪器数据异常实时监控系统研究[J].数字技术与应用,2012,30(10):12-14. 被引量：1
6翁颖钧,朱仲英.基于动态时间弯曲的时序数据聚类算法的研究[J].计算机仿真,2004,21(3):37-40. 被引量：31

引证文献1

1成万里,余尚江,卢亚.地震监测时间序列异常值检测策略[J].数字技术与应用,2014,32(9):40-41. 被引量：2

二级引证文献2

1常俊,乔波,赵曦.地震前兆数据连续性及稳定性量化评价[J].地震地磁观测与研究,2016,37(1):123-130.
2陈贤,黄恩贤,成万里,余尚江.信阳台FHD地磁仪观测数据典型干扰识别及数据处理[J].地下水,2020,42(2):101-103. 被引量：2

1王振,孙志刚.散乱点云噪声分析与降噪方法研究[J].计算机与数字工程,2015,43(9):1668-1673. 被引量：7
2吕天阳,王钲旋,左万利.一种基于离群点信息的新型无监督聚类方法[J].中国图象图形学报（A辑）,2004,9(9):1095-1100. 被引量：1
3余泽.基于相对密度和熵的混合属性聚类融合算法[J].计算机系统应用,2014,23(12):125-130.
4罗旭.基于自组织神经网络(SOM)的成绩分析[J].硅谷,2010,3(11):114-114.
5李今.大数据分析在城市照明管理系统中的应用[J].软件导刊,2015,14(5):1-4. 被引量：3
6聂建辉,胡英,马孜.散乱点云离群点的分类识别算法[J].计算机辅助设计与图形学学报,2011,23(9):1526-1532. 被引量：27
7CHE Tao LI Xin JIN Rui.Monitoring the frozen duration of Qinghai Lake using satellite passive microwave remote sensing low frequency data[J].Chinese Science Bulletin,2009,54(13):2294-2299. 被引量：6
8Hongmei DUAN,Yanyan WANG,Zhenkai XIE.The application of the BP neural network in the housing demand model[J].International English Education Research,2015(11):64-67.
9张俊溪,杨海粟.基于层次聚类的离群点分析方法[J].计算机技术与发展,2014,24(8):80-83. 被引量：5
10尤元建,黄增建.Hadoop管理系统研究与实现[J].中国新通信,2016,18(17):19-20.

哈尔滨工程大学学报

2009年第3期

浏览历史

内容加载中请稍等...

基于离群点识别的聚类结果属性特征簇发现被引量：1

参考文献11

共引文献44

同被引文献6

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于离群点识别的聚类结果属性特征簇发现 被引量：1

参考文献11

共引文献44

同被引文献6

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于离群点识别的聚类结果属性特征簇发现被引量：1