期刊文献+

基于本体增量学习的主题爬行

Focused crawling based on incremental ontology learning
下载PDF
导出
摘要 在面向领域的信息搜索中,本体作为相关的领域知识往往有助于改善搜索效果,在信息检索中被广泛应用。基于本体增量学习的主题爬行技术的研究工作主要包括:首先,利用本体对领域概念及关系的描述作为网页主题判定的依据;其次,在爬行过程中,将学习得到的新概念和关系加入到本体中,以丰富完善领域本体,进而提高主题爬虫的收获率;最后,经过大量的实验数据分析,在词条提取准确率、收获率和响应速度等多个指标方面,证明了所提出的方法是可行的并且是高效的。 In the oriented-domain information retrieval,ontology as the relevant domain knowledge generally helps improve the search results,which is widely used in information retrieval.Therefore,a method that focused crawling based on incremental ontology learning is proposed in this paper,and the main contents are as follows: firstly,it identifies the topic web pages by using the description of ontology to domain concepts and their relationships to be as the main foundation;then,during the crawling process,it puts the new concepts and relationships by learning into ontology in order to enrich and perfect the domain ontology and improve the harvest rate of focused crawler;lastly,through a great deal of experimental evaluation such as the accuracy of terms extraction,harvest rate and response rate,it proves this method is feasible and effective.
作者 王鑫 王英
出处 《长春工程学院学报(自然科学版)》 2010年第4期81-85,共5页 Journal of Changchun Institute of Technology:Natural Sciences Edition
基金 国家自然科学基金(60973040) 国家教育部高等学校博士学科点专项科研基金(200801830021)
关键词 主题爬虫 本体增量学习 概念树 SF-CF模型 准术语窗口提取模型 focused crawler incremental ontology learning concept tree SF-CF model quasi-term window extraction model
  • 相关文献

参考文献8

  • 1S Chakrabarti,M van den Berg. B Dora Focused craw- lingua new approach to topic-specific web resource dis- covery[J]. Computer Networks,May 1999,31(11-16) : 1623-1640.
  • 2S Chakrabarti, K Punera, M Subramanyam. Accel- erated Focused Crawling through Online Relevance Feed- back[A]. Proceedings of the llth International Confer- ence on World Wide Web[-C], Hawaii, USA,2002: 148- 159.
  • 3Maedche A , Staab S . Ontology learning for the se-mantle Web[J]. IEEE Intelligent System, Special Issue on the Semantic Web, 2001,16(2):72-79.
  • 4杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847. 被引量:242
  • 5Roberto Navigli, Paola Velardi. Learning domain ontolo- gies from document warehouses and dedicated web site [J]. Computational Linguistics,2004,30(2) : 151- 179.
  • 6Mohamed Rouane Hacene, Amedeo Napoli. Ontology Learning from Text using Relational Concept Analysis [J]. International MCETECH Conference on e-Tech- nologies,2008(4) :154-163.
  • 7T J . A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[J]. In Proceedings of the 14th International Conference on Machine Learn- ing, 1997(4) : 143- 151.
  • 8Arvind Arasu, Jasmine Novak. PageRank Computation and the Structure of the Web: Experiments and Algo- rithms[A]. Proceedings of the llth International Confer- enee on World Wide Web[C]. Beijing,2002 : 221- 241.

二级参考文献2

共引文献241

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部