期刊文献+

基于信息增益的自适应主题爬行策略 被引量:3

Adaptive focused crawling method based on information gain
下载PDF
导出
摘要 结合信息增益,提出了一种新的自适应主题爬行策略。利用维基百科的分类树和主题描述文档构建主题向量T,并在爬行过程中不断地进行自动学习,反馈更新主题向量空间中每个概念的权重,完善主题描述。实验结果表明,该方法具有增量爬行的能力,并在信息量总和上明显优于基于the interest ratio的自适应策略;且前者所爬取的网页更接近于与主题相关。 In combination with information gain,this paper proposed a new adaptive focused crawling method.It set up topic vector T by category tree and topic descriptive article of Wikipedia,and automatically learned and fed back to modify weight of each concept in the topic vector space during crawling,improving topic description.Experimental results show that the method contributes to the focused crawler an incremental crawling ability,it is superior to the adaptive method based on the interest ratio significantly in terms of sum of information,and Web pages crawled with the former are more related to the topic than the latter.
出处 《计算机应用研究》 CSCD 北大核心 2012年第2期501-503,共3页 Application Research of Computers
基金 中央高校研究生科技创新基金个人项目(CDJXS11180014)
关键词 主题爬行 维基百科 主题描述 自适应方法 信息增益 focused crawling Wikipedia topic description adaptive method information gain
  • 相关文献

参考文献8

  • 1AGGARWAL C C, AL-GARAWI F, YU P S. Intelligent crawling on the world wide Web with arbitrary predicates [ C ]//Proc of the 10th International Conference on World Wide Web. New York: ACM Press ,2001:96-105.
  • 2MENCZER F, PANT G, SEINIVASAN P. Topical Web crawlers: e- valuating adaptive algorithms [ J ]. ACM Trans on Interact Tech- nology,2004, 4(4) :378-419.
  • 3SU Chang, GAO Yang, YANG Jian-mei, et al. An efficient adaptive focused crawler based on ontology learning[ C ]//Proc of the 5th Inter- national Conference on Hybrid Intelligent Systems. Washington DC: IEEE Computer Society,2005:73-78.
  • 4赵佳鹤,王秀坤,刘亚欣.基于语义分析的主题信息采集系统的设计与实现[J].计算机应用,2007,27(2):406-408. 被引量:14
  • 5Wikipedia[EB/OL]. [2011-06-16]. http ://wikipedia. jaylee, cn/.
  • 6STRUBE M, PONZETrO S M. WikiRelatel computing semantic re- latedness using Wikipedia [ C ]//Proc of the 21 st National Conference on Artificial Intelligence. Cambridge: AAAI Press, 2006: 1419- 1424.
  • 7中文维基百科资源下载[EB/OL].[2010-11-09].hup://dumps.wikimedia.org/zhwiki/.
  • 8HERSOVICI M, JACOVI M, MAAREK Y S, et al. The shark-search algorithm. An application : tailored Web site mapping [ C ]//Proc of the 7th International Conference on World Wide Web 7. Amsterdam: Elsevier Science, 1998:317- 326.

二级参考文献8

共引文献13

同被引文献18

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部