期刊文献+

一种基于相似性的文档聚类算法 被引量:2

A document clustering algorithm based on similarity
下载PDF
导出
摘要 针对常见信息检索技术的缺陷 ,提出一种基于相似性的文档聚类分析算法 .将文档集合转化为向量集合 ,基于向量之间的余弦相似度 ,采取凝聚的层次聚类算法来获得聚类 .给出了算法的详细描述和一个测试实例 . This paper proposed the algorithm for a document clustering based on similarity to overcome the drawbacks of ordinary information search. A set of documents was converted into a set of vectors. Based on cosine similarity between vectors clusters were obtained by adopting the algorithm for agglomerative hierarchical clustering. The algorithm was described and a test example was offered.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2002年第12期59-61,共3页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
关键词 相似性 文档聚类算法 聚类分析 最邻近簇 信息检索 cluster analysis similarity partitioning nearest cluster
  • 引文网络
  • 相关文献

参考文献5

  • 1Wang K, Zhou S, Kiew C S. Building hierarchical classifiers using class proximity. In: Proc. 1999 Int. Conf. Very Large Data Bases(VLDB'99). Edinburgh, 1999. 363-374
  • 2Guha S, Rastogi R, Shim K. Cure: An efficient clustering algorithm for categorical attributes. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD'98). Seattle, 1998. 73-84
  • 3HanJiawei MichelineKamber.数据挖掘概念与技术[M].北京:机械工业出版社,2001.152-160.
  • 4邹涛,黄源,张福炎.基于WWW的文本信息挖掘[J].情报学报,1999,18(4):291-295. 被引量:47
  • 5Karypis G, Han E H, Kumar V. Chameleon: a hierarchical clustering algorithm using dynamic modeling Computer, 1999,32:68-75

二级参考文献1

共引文献79

同被引文献11

  • 1杨斌,孟志青.一种文本分类数据挖掘的技术[J].湘潭大学自然科学学报,2001,23(4):34-37. 被引量:10
  • 2Zhou Haofeng, Lou Yubo. Refining Web Authoritative Resource by Frequent Structures[C]. In: Proceedings of the Seventh International Database Engineering and Applications Symposium(IDEAS2003),2003.
  • 3Wu Fei, Gardarin G. Gradual Clustering Algorithm[C]. In: Proceedings of Seventh Intematkmal Conference on Database Systems for Advanced Applications, 2001: 48-55.
  • 4Lin K I, Kondadadi R. A Similarity-based Soft Clustering Algorithm for Documents[C]. In: Proceedings of Seventh International Conference on Database Systems for Advanced Applications, 2001:40-47.
  • 5D. Cutting, D. Karger, J. Pedersen, and J. Tukey. A Cluster-based Approach to Browsing Large Document Collections [A]. Proceedings of the 15th ACM SIGIR [C]. New York: Wiley, 1992, pp: 318-329.
  • 6S. Weiss, B. White, C. P? te, and F. Damerau. Lightweight Document Matching for Help-desk Applications [J]. IEEE Intelligent Systems. Seattle, Washington, 2000, 21(4): 782-789.
  • 7O. Zamir, O. Etzioni, O. Madani, and R. Karp. Fast and Intuitive Clustering of Web Documents[A]. Processdings of the 3^rd International Conference on Knowledge Discovery and Data Mining [C]. Morfan Kanfmann. New York: Wiley, 1997: 567-573.
  • 8HAN J W, KAMBER M. Data mining., concepts and techniques [M]. San Francisco: Morgan Kaufmann Publishers Inc, 2000.
  • 9王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275
  • 10梅馨,邢桂芬.文本挖掘技术综述[J].江苏大学学报(自然科学版),2003,24(5):72-76. 被引量:29

引证文献2

二级引证文献6

;
使用帮助 返回顶部