期刊文献+

A new focused crawler using an improved tabu search algorithm incorporating ontology and host information

原文传递
导出
摘要 To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods,in this paper,we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information(FCITS_OH),where a domain ontology is constructed by formal concept analysis to describe topics at the semantic and knowledge levels.To avoid crawling visited hyperlinks and expand the search range,we present an improved tabu search(ITS)algorithm and the strategy of host information memory.In addition,a comprehensive priority evaluation method based on Web text and link structure is designed to improve the assessment of topic relevance for unvisited hyperlinks.Experimental results on both tourism and rainstorm disaster domains show that the proposed focused crawlers overmatch the traditional focused crawlers for different performance metrics.
出处 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2023年第6期859-875,共17页 信息与电子工程前沿(英文版)
基金 supported by the Guangdong Basic and Applied Basic Research Foundation of China(Nos.2021A1515011974 and 2023A1515011344) the Program of Science and Technology of Guangzhou,China(No.202002030238)。
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部