摘要
基于关键词的搜索引擎满足了人们一定的需要,但由于其通用的性质,并不能满足用户的个性化需求,为此,设计并实现了一个基于示例的个性化Web信息自动获取系统.该系统采用了一种新的基于少量Web示例网页和语料库词频统计的特征抽取算法和过滤阈值设定方法.实验结果表明,较基于关键词的搜索引擎而言,该系统能充分考虑用户的兴趣偏好(示例),长期、主动地向用户提供更加准确的Web信息获取服务.
current search engines based on keywords satisfy some users' need, they can't meet users' personalized demands for their all purpose characteristics. The design and implementation of a novel personalized Web information auto-retrieval system based on small samples is presented. This system adopts a new algorithm of fea- ture extraction and a new method to determine filtering threshold based on small webpage training sets and term-frequency statistics of corpus. Experimental results show that this system can long-termly and on its own initiative provide more accurate Web information-obtaining service to a user according to his interest than the search engines based on keywords.
出处
《郑州大学学报(理学版)》
CAS
2006年第4期44-49,共6页
Journal of Zhengzhou University:Natural Science Edition