期刊文献+

文档聚类综述 被引量:65

A Survey of Document Clustering
下载PDF
导出
摘要 聚类作为一种自动化程度较高的无监督机器学习方法,近年来在信息检索、多文档自动文摘等领域获得了广泛的应用。本文首先讨论了文档聚类的应用背景和体系结构,然后对文档聚类算法、聚类空间的构造和降维方法、文档聚类中的语义问题进行了综述。最后还介绍了聚类质量评测问题。 As an unsupervised machine learning method, document clustering has been widely used in many NLP applications such as information retrieval, automatic multi-document summarization and etc. In this paper the background and the architecture of document clustering is discussed firstly, and then some related problems are surveyed which includes clustering algorithm, feature space construction, dimension reduction and the semantic problem. In the end this paper introduces the evaluation of cluster quality.
出处 《中文信息学报》 CSCD 北大核心 2006年第3期55-62,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金重点资助项目(60435020)
关键词 计算机应用 中文信息处理 综述 文档聚类 降维 概念相关 聚类算法 computer application Chinese information processing overview document clustering dimension reduction concept relevance clustering algorithm
  • 相关文献

参考文献39

  • 1马帅,王腾蛟,唐世渭,杨冬青,高军.一种基于参考点和密度的快速聚类算法[J].软件学报,2003,14(6):1089-1095. 被引量:108
  • 2孙学刚,陈群秀,马亮.基于主题的Web文档聚类研究[J].中文信息学报,2003,17(3):21-26. 被引量:31
  • 3吴斌,傅伟鹏,郑毅,刘少辉,史忠植.一种基于群体智能的Web文档聚类算法[J].计算机研究与发展,2002,39(11):1429-1435. 被引量:41
  • 4Regina Barzilay,Min-Yen Kan,and Kathleen R.McKeown.Simfinder:A Flexible Clustering Tool for Summarization[A].In proceedings of the Workshop on Summarization in NAACL 01[C].Pittsburg,Pennsylvania,USA:June 2001.
  • 5Zheng Chen,Wei-Ying Ma,Jinwen Ma.Learning to Cluster Web Search Results[A].In:proceedings of the 27th Annual International ACM SIGIR Conference[C].Sheffield,South Yorkshire,UK,July 2004,210 -217.
  • 6林鸿飞,马雅彬.基于聚类的文本过滤模型[J].大连理工大学学报,2002,42(2):249-252. 被引量:9
  • 7Y.C.Fang,S.Parthasarathy,F.Schwartz.Using Clustering to Boost Text Classification[J].In:proceedings of the IEEE ICDM Workshop on Text Mining,Maebashi City,Japan,2002.
  • 8A.Rauber,and M.Frühwirth.Automatically Analyzing and Organizing Music Archives[A].In:proceedings of the 5.European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001)[C].Darmstadt,Germany,2001.
  • 9Cutting,D.,Karger,D.,and etc.Scatter/Gather:A Cluster-based Approach to Browsing Large Document Collections[A].SIGIR ‘ 92,1992[C].318-329.
  • 10JR Wen,JY Nie,HJ Zhang.Clustering User Queries of a Search Engine[A].The Tenth International World Wide Web Conference[C].Hong Kong.May 1 -5,2001.

二级参考文献40

  • 1黄昌宁,李涓子.词义排歧的一种语言模型[J].语言文字应用,2000(3):85-90. 被引量:16
  • 2M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96),1996.
  • 3M. Ankerst, M. Breunig, H. -P. Kriegel, and J. Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of the Data(SIGMOD' 99),1999.
  • 4Yang, Y., Pedersen, J.O. A Comparative Study on Feature Selection in Text Categorization. Proc. of the 14th International Conference on Machine Learning ICML97.
  • 5Eui-Hong Han, George Karypis and Vipin Kumar. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Pacific-Asia Conference on Knowledge Diseovery and Data Minings, 2001.
  • 6Han JW, Kambr M. Data Mining Concepts and Techniques. Beijing: Higher Education Press, 2001. 145-176.
  • 7Kaufan L, Rousseeuw PJ. Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
  • 8Ester M, Kriegel HP, Sander J, Xu X. A density based algorithm for discovering clusters in large spatial databases with noise. In:Simoudis E, Han JW, Fayyad UM, eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland: AAAI Press, 1996. 226-231.
  • 9Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. "73-84.
  • 10Agrawal R, Gehrke J, Gunopolos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining application. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data.Seattle: ACM Press, 1998.94-105.

共引文献269

同被引文献663

引证文献65

二级引证文献286

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部