摘要
1,引言
近年来,随着互联网的迅速发展,基于Web的数据挖掘技术受到越来越多的关注,经常用在文本挖掘和信息检索等多个领域的聚类(Clustering)技术也成为人们研究的热点.对一组实际或抽象的元素进行处理,把相似的元素归为同类的过程称之为聚类[1].对文本信息,如科技文献、Web文档等的聚类,称之为文档聚类(Document Clustering).最初,文档聚类常用于提高信息检索系统的查准率和查全率(recall),或用来寻找与一篇文档最为相似的文档[2].现在,人们利用文档聚类来获得一组满足用户要求的文档集合并按用户需求对其进行排序.另外在Internet上,文本聚类也可用来自动产生文档的层次聚类,从而实现对Web文档的分类.
Document clustering has been used in a number of different areas of text mining and information retrieval. This paper first introduces the presentation of document clustering and it's ground, VSM(Vector Space Mode). On the other hand,comparing with the VSM.we present a new model to calculate the word weight in a document based on BP neural net. On the ground of it,two document clustering algorithms are described aiming at scientific literature on the Web. One is to get document sets relevant to user's query,and the other is to extract more personalized interesting items.
出处
《计算机科学》
CSCD
北大核心
2002年第8期93-95,共3页
Computer Science