摘要
提出一个基于用户群的智能主题爬虫系统CITC。它首先对用户群日志进行挖掘,得到相应的知识库。在知识库的指导下,CITC采用多重选择策略,对网页进行选择性爬取。实验结果表明,此系统能够基于用户群兴趣有效地抓取目的网页。
A Community-Specific Intelligent Topic Crawler is introduced. This system mines the Web logs of community, which results in corresponding knowledge base. With the guidance of the knowledge base and multi-layer selective strategy,CITC fetch relevant pages selectively. The experiment shows that this system can fetch relevant pages efficiently based on the interest of user community.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2007年第2期230-233,共4页
Journal of Guangxi Normal University:Natural Science Edition
基金
甘肃省自然科学基金资助项目(3ZS051-A25-035)
关键词
用户群
网页对偶筛选
知识库
主题爬虫
相关度
users community
page dual filter
knowledge base
topic crawler
relevancy