摘要
Internet是一个巨大的,分步广泛的,动态性强的全球信息服务中心,人们想在它上面找到想要的相关信息是很困难的,一般用户通过给搜索引擎提供简短的关键词来检索信息,但是通过搜索引擎返回的相关结果太多,这使得处理相关结果太耗时,本文提出了一种语义虚拟文档(SVD)来表示web文档,在此基础上实现了凝聚层次聚类算法,以自动聚类内容相似的web文档。结果:一方面使网络用户增强了相关结果的判断处理,同时使用户快速、高效的从Internet上发现想要的信息,另一方面返回的结果在知识表示上增强了web内容挖掘。
Internet is a global service center, which is very large, widespread, dynamic. It is very difficult for people to find relevant information on the Internet. Most users typically search their information by short keywords to search engines, but search engines usually return too many relevant results, which make the process of relevant results time-consuming. In this paper, we put forward a Semantic Virtual Document to represent web document. Based on this, we realize Hierarchical Agglomerative Clustering and achieve automatic content-based categorization of similar web document. As a result, on the one hand, document browsing enhance relevant judgement process for Internet users and find information-wanted swiftly and efficiently. On the other hand, returned results enhance web content mining on knowledge representation.
出处
《微计算机信息》
北大核心
2006年第08X期302-304,共3页
Control & Automation
基金
陕西省自然科学基金(98X11)
陕西省教育厅重点科研计划项目(00JK015)