摘要
在面向领域的信息搜索中,本体作为相关的领域知识往往有助于改善搜索效果,在信息检索中被广泛应用。基于本体增量学习的主题爬行技术的研究工作主要包括:首先,利用本体对领域概念及关系的描述作为网页主题判定的依据;其次,在爬行过程中,将学习得到的新概念和关系加入到本体中,以丰富完善领域本体,进而提高主题爬虫的收获率;最后,经过大量的实验数据分析,在词条提取准确率、收获率和响应速度等多个指标方面,证明了所提出的方法是可行的并且是高效的。
In the oriented-domain information retrieval,ontology as the relevant domain knowledge generally helps improve the search results,which is widely used in information retrieval.Therefore,a method that focused crawling based on incremental ontology learning is proposed in this paper,and the main contents are as follows: firstly,it identifies the topic web pages by using the description of ontology to domain concepts and their relationships to be as the main foundation;then,during the crawling process,it puts the new concepts and relationships by learning into ontology in order to enrich and perfect the domain ontology and improve the harvest rate of focused crawler;lastly,through a great deal of experimental evaluation such as the accuracy of terms extraction,harvest rate and response rate,it proves this method is feasible and effective.
出处
《长春工程学院学报(自然科学版)》
2010年第4期81-85,共5页
Journal of Changchun Institute of Technology:Natural Sciences Edition
基金
国家自然科学基金(60973040)
国家教育部高等学校博士学科点专项科研基金(200801830021)