摘要
聚类作为一种自动化程度较高的无监督机器学习方法,近年来在信息检索、多文档自动文摘等领域获得了广泛的应用。本文首先讨论了文档聚类的应用背景和体系结构,然后对文档聚类算法、聚类空间的构造和降维方法、文档聚类中的语义问题进行了综述。最后还介绍了聚类质量评测问题。
As an unsupervised machine learning method, document clustering has been widely used in many NLP applications such as information retrieval, automatic multi-document summarization and etc. In this paper the background and the architecture of document clustering is discussed firstly, and then some related problems are surveyed which includes clustering algorithm, feature space construction, dimension reduction and the semantic problem. In the end this paper introduces the evaluation of cluster quality.
出处
《中文信息学报》
CSCD
北大核心
2006年第3期55-62,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金重点资助项目(60435020)
关键词
计算机应用
中文信息处理
综述
文档聚类
降维
概念相关
聚类算法
computer application
Chinese information processing
overview
document clustering
dimension reduction
concept relevance
clustering algorithm