期刊文献+

基于相关随机子空间的分类数据聚类集成 被引量:2

Relevant random subspace-based clustering ensemble for categorical data
下载PDF
导出
摘要 为了提升分类数据聚类集成的效果,提出了一种新的相关随机子空间聚类集成模型。该模型利用粗糙集理论将分类属性分解成相关和不相关子集,在相关属性子集上随机生成多个相关子空间并对分类数据进行聚类,通过集成多个较优且具差异性的聚类结果以获得最终的聚类划分。此外,将粗糙集约简概念应用于相关子空间属性数目的确定,有效地避免了参数对聚类结果的影响。UCI数据集实验表明,新模型的性能优于其他已有模型,说明了其有效性。 In order to improve the quality of clustering ensemble for categorical data,this paper proposed a relevant random subspace-based clustering ensemble model.Based on the theory of rough sets,the model first decomposed the entire set of categorical attributes into relevant and irrelevant attribute sets.Then it used the relevant attribute set to generate the relevant subspaces randomly and obtained a final clustering solution by combing multiple good and diverse partitions resulting from the relevant subspaces.Moreover,the model employed the concept of attribute reduction in rough sets to determine the number of attributes in each relevant subspace,avoiding the effect of parameter on clustering ensemble result effectively.Empirical results on selected UCI data sets show that the proposed model achieves better and more robust clustering performance compared to some representative clustering ensemble models for categorical data,showing the effectiveness of the proposed model.
出处 《计算机应用研究》 CSCD 北大核心 2013年第4期1082-1084,共3页 Application Research of Computers
基金 国家自然科学基金资助项目(70972062) 上海市哲学社会科学规划课题(2011BGL011) 上海市重点学科项目(S30504) 上海财经大学研究生科研创新基金资助项目
关键词 分类数据 粗糙集 属性约简 相关子空间 聚类集成 categorical data rough sets attribute reduction relevant subspace clustering ensemble
  • 相关文献

参考文献12

  • 1JAIN A K. Data clustering: 50 years beyond K-means[ J]. Pattern Recognition Letters ,2010,31 ( 8 ) :651-666.
  • 2JAIN A K, DUBES R C. Algorithms for clustering data[ M]. New Jersey : Prentice-Hall, 1988.
  • 3蒋亦樟,王士同.基于方差权重矩阵模型的高维数据子空间聚类算法[J].计算机应用研究,2012,29(8):2868-2871. 被引量:3
  • 4STREHL A, GHOSH J. Cluster ensembles:a knowledge reuse frame- work for combining multiple partitions [ J ]. Journal of Machine Learning Research ,2002,3( 1 ) :583-617.
  • 5阳琳贇,王文渊.聚类融合方法综述[J].计算机应用研究,2005,22(12):8-10. 被引量:28
  • 6LI Tao, OGIHARA M, MA Sheng. On combining multiple cluster- ings: an overview and a new perspective[ J]. Applied Intelligence, 2010,33(2) :207-219.
  • 7VEGA-PONS S, RUIZ-SHULCLOPER J. A survey of clustering en- semble algorithms [ J]. International Journal of Pattern Recogni- tion and Artificial Intelligence ,2011,25(3 ) :337-372.
  • 8AGRESTI A. An introduction to categorical data analysis [ M ]. 2nd ed. New Jersey: Wiley,2007.
  • 9HE Zeng-you, XU Xiao-fei, DENG Sheng-chun. A cluster ensemble method for clustering categorical data [ J ]. Information Fusion, 2005,6(2) :143-151.
  • 10李桃迎,陈燕,张金松,张琳.一种面向分类属性数据的聚类融合算法研究[J].计算机应用研究,2011,28(5):1671-1673. 被引量:7

二级参考文献47

  • 1杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:190
  • 2EVERITT B S, LANDAU S, LEESE M. Cluster analysis[M]. 4th ed. London: Arnold, 2001.
  • 3JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review [J]. ACM Computing Surveys, 1999,31 ( 3 ) :264-323.
  • 4FRED A L. Finding consistent clusters in data partitions [ C ]//Proc of the 2nd International Workshop on Multiple Classifier Systems. Cambridge: Springer, 2001 : 309-318.
  • 5STREHL A, GHOSH J. Cluster ensembles: a knowledge reuse frame-work for combining multiple partitions [ J ]. Journal of Machine Learning Research, 2003,3(3):583-617.
  • 6HE Zeng-you, XU Xiao-fei, DENG Sheng-chun. A cluster ensemble method for clustering categorical data [ J ]. Information Fusion,2005, 6(2) :143-151.
  • 7FRED A, JAIN A K. Data clustering using evidence accumulation [ C]//Proc of the 16th International Conference on Pattern Recognition. Washington DC : IEEE Computer Society,2002 : 276-280.
  • 8LI Tao-ying, CHEN Yan. Fuzzy clustering ensemble algorithm for partitioning categorical data[ C ]//Proc of the 2nd International Conference on Business Intelligent and Financial Engineering. Washington DC : IEEE Computer Society,2009 : 170-174.
  • 9FORSYTH R. UCI machine learning repository[ DB/OL]. ( 1990-05- 15 ). http ://archive. ies. uci. edu/ml/datasets/Zoo.
  • 10R O Duda, P E Hart, D G Stork. Pattern Classification (2nd Edition) [M]. New York: Wiley, 2001. 454-458.

共引文献33

同被引文献32

  • 1陈孝新.熵权法在股票市场的应用[J].商业研究,2004(16):139-140. 被引量:9
  • 2Han Jiawei,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2008.
  • 3Han J W.Kamber M.数据挖掘:概念与技术[M].范明,孟晓峰,译.第3版.北京:机械工业出版社,2012:288-289.
  • 4Bouguessa M,Wang S,Jiang Q.A K-means-based algorithm for projective clustering[C]//Pattern Recognition,2006.ICPR 2006.18th International Conference on.IEEE,2006,1:888-891.
  • 5Ng R T,Han J.CLARANS:A method for clustering objects for spatial data mining[J].Knowledge and Data Engineering,IEEE Transactions on,2002,14(5):1003-1016.
  • 6Ester M,Kriegel H P,Sander J,et al.Density-based spatial clustering of applications with noise(DBSCAN)[C]//Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining(KDD-96).1996,96:226-231.
  • 7Beyer K,Goldsteein J,Ramakrishnan R,et al.When is“nearest neighbors”meaningful?[M]//Database Theory-ICDT’99 springer,Berlin Heidelerg,1999:217-235.
  • 8Agrawal R,Gehrke J,Gunopulos D,et al.Automatic subspace of high dimensional data for data mining application[C]//Proceeding of the 1998 ACM-SIGMOD International Conference on Management of Data,New York,USA,1998:94-105.
  • 9UCI data base[EB/OL].[2012-12-19]http://archive.ics.uci.edu/ml/datasets.html.
  • 10He Z Y,Xu X F,Deng S C.A cluster ensemble method for clustering categorical data[J].Information Fusion,2005,6(2):143-151.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部