基于本体增量学习的主题爬行

Focused crawling based on incremental ontology learning

下载PDF

导出

摘要在面向领域的信息搜索中,本体作为相关的领域知识往往有助于改善搜索效果,在信息检索中被广泛应用。基于本体增量学习的主题爬行技术的研究工作主要包括:首先,利用本体对领域概念及关系的描述作为网页主题判定的依据;其次,在爬行过程中,将学习得到的新概念和关系加入到本体中,以丰富完善领域本体,进而提高主题爬虫的收获率;最后,经过大量的实验数据分析,在词条提取准确率、收获率和响应速度等多个指标方面,证明了所提出的方法是可行的并且是高效的。 In the oriented-domain information retrieval,ontology as the relevant domain knowledge generally helps improve the search results,which is widely used in information retrieval.Therefore,a method that focused crawling based on incremental ontology learning is proposed in this paper,and the main contents are as follows： firstly,it identifies the topic web pages by using the description of ontology to domain concepts and their relationships to be as the main foundation;then,during the crawling process,it puts the new concepts and relationships by learning into ontology in order to enrich and perfect the domain ontology and improve the harvest rate of focused crawler;lastly,through a great deal of experimental evaluation such as the accuracy of terms extraction,harvest rate and response rate,it proves this method is feasible and effective.

作者王鑫王英

机构地区长春工程学院软件职业技术学院吉林大学计算机科学与技术学院

出处《长春工程学院学报（自然科学版）》 2010年第4期81-85,共5页 Journal of Changchun Institute of Technology：Natural Sciences Edition

基金国家自然科学基金(60973040) 国家教育部高等学校博士学科点专项科研基金(200801830021)

关键词主题爬虫本体增量学习概念树 SF-CF模型准术语窗口提取模型 focused crawler incremental ontology learning concept tree SF-CF model quasi-term window extraction model

分类号 TP31 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1S Chakrabarti,M van den Berg. B Dora Focused craw- lingua new approach to topic-specific web resource dis- covery[J]. Computer Networks,May 1999,31(11-16) : 1623-1640.
2S Chakrabarti, K Punera, M Subramanyam. Accel- erated Focused Crawling through Online Relevance Feed- back[A]. Proceedings of the llth International Confer- ence on World Wide Web[-C], Hawaii, USA,2002: 148- 159.
3Maedche A , Staab S . Ontology learning for the se-mantle Web[J]. IEEE Intelligent System, Special Issue on the Semantic Web, 2001,16(2):72-79.
4杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847. 被引量：242
5Roberto Navigli, Paola Velardi. Learning domain ontolo- gies from document warehouses and dedicated web site [J]. Computational Linguistics,2004,30(2) : 151- 179.
6Mohamed Rouane Hacene, Amedeo Napoli. Ontology Learning from Text using Relational Concept Analysis [J]. International MCETECH Conference on e-Tech- nologies,2008(4) :154-163.
7T J . A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[J]. In Proceedings of the 14th International Conference on Machine Learn- ing, 1997(4) : 143- 151.
8Arvind Arasu, Jasmine Novak. PageRank Computation and the Structure of the Web: Experiments and Algo- rithms[A]. Proceedings of the llth International Confer- enee on World Wide Web[C]. Beijing,2002 : 221- 241.

二级参考文献2

1杜波,田怀凤,王立,陆汝占.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,31(14):159-160. 被引量：26
2郑家恒,卢娇丽.关键词抽取方法的研究[J].计算机工程,2005,31(18):194-196. 被引量：41

共引文献241

1陈德彦,赵宏,张霞.专家视图与本体视图的语义映射方法[J].软件学报,2020(9):2855-2882. 被引量：7
2胡金柱,舒江波,陈志伟,杜志强,窦桂琴.基于本体构件的工作流检索研究[J].四川大学学报（工程科学版）,2007,39(S1):64-67.
3王芳,滕桂法,张玉新,任力生,马建斌,赵洋.基于本体的农业问答语义管理系统设计与实现[J].河北农业大学学报,2008,31(5):115-117. 被引量：3
4唐涛.基于文本挖掘的领域本体学习模型研究[J].图书情报工作,2010,54(S2):348-352. 被引量：3
5田晓丹,王庆林,李原.基于特征向量的本体概念上下位关系验证[J].中南大学学报（自然科学版）,2013,44(S2):351-354. 被引量：1
6李亢,李新明,刘东.面向数据语义集成的装备领域本体构建研究[J].系统仿真学报,2015,27(5):1071-1080. 被引量：6
7王珊,张俊,彭朝晖,战疆,杜小勇,Zhao-hui Xiao-yong.基于本体的关系数据库语义检索[J].计算机科学与探索,2007,1(1):59-78. 被引量：15
8杜小勇,马文峰,武文娟.学科领域本体的构建与进化——以经济学领域本体为例[J].现代图书情报技术,2007(3):7-12. 被引量：15
9修佳鹏,熊燕,张雷,吴建林.基于OWL的战场本体构建方法[J].郑州大学学报（理学版）,2007,39(2):136-141. 被引量：10
10饶祎,刘鹏.基于本体的GridGIS服务发现框架研究[J].地理信息世界,2007,5(4):45-48. 被引量：3

1李欣昱.基于最大熵模型的网页主题判定研究[J].计算机光盘软件与应用,2011(11):91-91.
2曾水香,罗林波.基于改进Hits算法的多主题爬虫研究与实现[J].福建电脑,2010,26(5):88-89. 被引量：2
3张立杰.主题爬行策略与算法研究综述[J].图书情报工作,2011,55(18):112-115. 被引量：1
4李璐,张国印,李正文.基于SVM的主题爬虫技术研究[J].计算机科学,2015,42(2):118-122. 被引量：12
5王宏艳.基于链接和内容的BLCT主题爬行算法研究[J].计算机应用研究,2011,28(2):495-497. 被引量：1
6秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术[J].自动化学报,2004,30(6):905-910. 被引量：10
7谭骏珊,陈可钦.聚焦爬行中网页爬行算法的改进[J].电脑知识与技术,2008,0(12Z):2145-2146. 被引量：2
8侯航.基于BFO的主题爬行算法[J].中国科技信息,2009(6):112-113.
9秦发金,姚晓洁,黄燕革.一类具有收获率的时滞阶段结构捕食系统的多重正周期解[J].数学的实践与认识,2010,40(4):120-127. 被引量：4
10张慧英,原福永,尹春霞.一种面向主题的链接评价算法[J].情报杂志,2008,27(9):6-8. 被引量：1

长春工程学院学报（自然科学版）

2010年第4期

浏览历史

内容加载中请稍等...

基于本体增量学习的主题爬行

参考文献8

二级参考文献2

共引文献241

相关作者

相关机构

相关主题

浏览历史