期刊文献+

混合属性数据聚类初始点选择的改进 被引量:3

Improved Clustering Algorithm for Mixed Numeric and Categorical Values
下载PDF
导出
摘要 k-prototypes和模糊k-prototypes是处理数值属性和分类属性混合数据主要的聚类算法。但这两种聚类算法不足之处是对初值有明显的依赖。对初值选取方法进行了分析和研究,提出一种新的改进方法,可在一定程度上减少随机性。实际数据集仿真结果表明改进算法有更高的稳定性和较强的伸缩性。 The k-prototypes algorithm and Fuzzy k-prototypes algorithm have become popular technique in solving categorical data clustering problems in different application domains. However, they also reuires random selection of initial points for the clusters. So it is obvious that outputs are especially sensitive to initial. Different initial points often lead to considerable distinct clustering results. This paper analyses the method of random selection and proposes a method of searching initial starting points through grouping data sets. Experiments show that the new initialization method leads to higher stability and flexibility.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2007年第4期220-223,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(70171033) 江苏省高校自然科学基础研究基金资助项目(07KJ520216)
关键词 聚类 k—modes k—prototypes 分类型数据 相异度 clustering k-prototypes categorical data dissimilarity
  • 相关文献

参考文献8

  • 1HAN Jia-wei,MICHELIE K.数据挖掘:概念与技术[M].2版.北京:机械工业出版社,2001.
  • 2HUANG Zhe-xue. Extensions to the k-means algorithm for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998,2 (3) : 283-304.
  • 3BARBARA D,CHEN Ping. Using self-similarity to cluster large data sets[J]. Data Mining and Knowledge Discovery, 2003,7 (2):123-152.
  • 4MODHA D S ,SPANGLER W S. Feature weighting in k-means clusteringJ]. Machine Learning, 2003,52(3) :217-237.
  • 5赵立江.基于数值型和分类型混合属性数据集的聚类算法研究[D].杭州:浙江大学计算机科学与工程学院,2005.
  • 6SUN Ying, ZHU Qiu-ming, CHEN Zheng-xin. An iterative initial-points refinement algorithm for categorical data clustering[J]. Pattern Recognition Letters, 2002,23 (7) : 875-884.
  • 7GAN Guo-jun, YANG Zi-jiang, WU Jian-hong. A genetic k-modes algorithm for clustering categorical data[C]//Proceedings of First International Conference on Advanced Data Mining and Applications. Berlin:Springer, 2005:195-202.
  • 8彭玲.一种新的动态进化聚类算法(英文)[J].广西师范大学学报(自然科学版),2006,24(4):103-106. 被引量:1

二级参考文献6

同被引文献17

  • 1张静,王建民,何华灿.基于属性相关性的属性约简新方法[J].计算机工程与应用,2005,41(28):55-57. 被引量:18
  • 2范洁,常晓航,杨岳湘.基于属性相关性的决策树规则生成算法[J].计算机仿真,2006,23(12):90-92. 被引量:9
  • 3KAMBER M ,WINSTONE L ,GONG W,et al. Generalization and decision tree induction :efficient classification in data mining[C]//Proceeding of the 1997 International Workshop on Research Issues on Data Engineering(RDE'97). New York : IEEE Press, 1997 : 111-120.
  • 4TANG Wen-yin,MAO K Z. Feature selection algorithm for mixed data with both nominal and continuous features [J]. Pattern Recognition Letters, 2007,28 : 563- 571.
  • 5张新丽.高维数据的特征选择及基于特征选择的集成学习研究[D].北京:清华大学计算机科学与技术系,2004.
  • 6LEI Yu,LIU Huan. Efficient feature selection via analysis of relevance and redundancy[J]. Journal of Machine Learning Research,2004,5 : 1205-1224.
  • 7Han Jiawei, KAMBER M. Data mining concepts and techniques[M]. 北京:机械工业出版社, 2001.
  • 8CHRISTOPHER J, BURGES C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and knowledge Discovery, 1998, 2(2) : 121-167.
  • 9VAPNIK V N. An overview of statistical learning theory [J]. IEEE Trans on Neural Network, 1999; 10(5): 988-999.
  • 10王峻.一种基于属性相关性度量的朴素贝叶斯分类模型[J].安庆师范学院学报(自然科学版),2007,13(2):14-16. 被引量:5

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部