摘要
对web文本聚类中的数据预处理、聚类算法及结果评估等进行了分析研究.在由lucene和nutch构建的搜索引擎的基础上,提出基于k-means聚类算法web网页聚类系统设计方案,并论述了各模块的设计与实现方法.
Web text clustering has been researched including text pre-treatment, clustering algorithms and capability evaluation in this paper. Based on a search engine system which has been implemented by lucene and nutch, a web page clustering system based on k-means algorithm is designed, and every module of it has been introduced.
出处
《韩山师范学院学报》
2008年第6期27-30,共4页
Journal of Hanshan Normal University
基金
韩山师范学院扶持科研项目(FC200506)