期刊文献+

一种面向数据缺失问题的K-means改进算法 被引量:5

An improved K-means algorithm for the data-missing problem
下载PDF
导出
摘要 文章针对聚类分析中部分数据缺失问题,提出了一种改进的K均值聚类算法,即改变原算法中计算每个数据到各簇距离的度量方法和新中心点生成方法,从而屏蔽空值数据的影响;通过选择UCI中Iris数据集,随机抽空部分数据进行测试表明,该算法可直接对存在数据空缺的数据集合进行聚类分析,并能有效屏蔽数据空缺对聚类结果的影响。 In this paper, an improved K-means clustering algorithm is presented to solve the data-missing problem in clustering analysis. The improved algorithm can reduce the disturbance of missing data through changing the method of measuring distance and generating new centers. In the experiment, original Iris data from UCI are used and some of them removed randomly. The result shows that this algorithm can analyze data sets with missing data directly and reduce the disturbance of missing data to the result of clustering effectively.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第9期1455-1457,共3页 Journal of Hefei University of Technology:Natural Science
关键词 聚类分析 K-MEANS算法 数据缺失 clustering analysis K-means algorithm data missing
  • 相关文献

参考文献5

二级参考文献48

  • 1李桂林,陈晓云.关于聚类分析中相似度的讨论[J].计算机工程与应用,2004,40(31):64-65. 被引量:26
  • 2Jhah等著 范明等译.数据挖掘:概念和技术[M].北京:机械工业出版社,2001-08..
  • 3WH普雷斯等著 王璞等译.数值方法大全-科学计算的艺术[M].兰州大学出版社,1991..
  • 4[1]Fasulo, D. An analysis of recent work on clustering algorithms. Technical Report, Department of Computer Science and Engineering, University of Washington, 1999. http://www.cs.washington.edu.
  • 5[2]Baraldi, A., Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 1999,29:786~801.
  • 6[3]Keim, D.A., Hinneburg, A. Clustering techniques for large data sets - from the past to the future. Tutorial Notes for ACM SIGKDD 1999 International Conference on Knowledge Discovery and Data Mining. San Diego, CA, ACM, 1999. 141~181.
  • 7[4]McQueen, J. Some methods for classification and Analysis of Multivariate Observations. In: LeCam, L., Neyman, J., eds. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967. 281~297.
  • 8[5]Zhang, T., Ramakrishnan, R., Livny, M. BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Quebec: ACM Press, 1996. 103~114.
  • 9[6]Guha, S., Rastogi, R., Shim, K. CURE: an efficient clustering algorithm for large databases. In: Haas, L.M., Tiwary, A., eds. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. 73~84.
  • 10[7]Beyer, K.S., Goldstein, J., Ramakrishnan, R., et al. When is 'nearest neighbor' meaningful? In: Beeri, C., Buneman, P., eds. Proceedings of the 7th International Conference on Data Theory, ICDT'99. LNCS1540, Jerusalem, Israel: Springer, 1999. 217~235.

共引文献159

同被引文献47

  • 1崔林,宋瀚涛,龚永罡,陆玉昌.基于Web使用挖掘的个性化服务技术研究[J].计算机系统应用,2005,14(3):23-26. 被引量:7
  • 2吕艳丽.基于Web使用挖掘的图书馆个性化系统研究[J].图书馆学刊,2006,28(4):135-137. 被引量:1
  • 3贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,24(1):10-13. 被引量:228
  • 4Mobasher B, Cooley R, Srivastava J. Automatic personaliza- tion based on Web usage mining[J]. Communication of the ACM, 2000,43(8):142--151.
  • 5Lazcorreta E, Botella F, Fernaindez-Caballero A. Towards personalized recommendation by two-step modified Apriori data mining algorithm [J]. Expert Systems with Applica- tions, 2008,35 (3): 1422--1429.
  • 6Tug E, Skiroglu M, Arslan A. Automatic discovery of the sequential accesses from Web log data files via a genetic al- gorithm [ J]. Knowledge-Based Systems, 2006, 9 ( 3 ) : 180--186.
  • 7Wang Shuqing,She Li, Liu Zhen, et al. Algorithm research on user interests extracting via Web log data[C]//2009 In- ternational Conference on Web Information Systems and Mining,WISM 2009,2009: 93--97.
  • 8Zaiane O R, Xin Man, Han J iaweil. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs[C]//Proceedings of the 1998 IEEE Forum on Research and Technology Advances in Digital Li- hrariea. ADL:1998: 19--29.
  • 9刘业政,焦宁,姜元春.连续属性离散化算法比较研究[J].计算机应用研究,2007,24(9):28-30. 被引量:20
  • 10张娜,何建民.基于项目与客户聚类的协同过滤推荐方法[J].合肥工业大学学报(自然科学版),2007,30(9):1159-1162. 被引量:10

引证文献5

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部