期刊文献+

基于自适应免疫进化算法的聚焦爬虫搜索策略 被引量:1

Search Strategy of Focused Crawler Based on Adaptive Immune Evolutionary Algorithm
下载PDF
导出
摘要 聚焦爬虫是主题搜索引擎的核心部件。针对目前聚焦爬虫搜索策略的不足,提出基于主题相关度和页面重要性相结合的综合相关度来判别页面主题相关性,并采用自适应免疫进化算法这种搜索策略指导聚焦爬虫的爬行,实验结果证明,该算法下载的主题相关网页数所占比例明显高于最佳搜索和广度优先搜索算法的比例,具有更高的搜索效率。 Focused crawler was a core component of the topic search engine.To overcome the deficiency of focused crawler search strategy,a comprehensive value based on theme relevance and importance of page was proposed to determine the topic relevant of the page,and the adaptive immune evolutionary algorithm of this search strategy was used to guide the crawling strategy of focused crawler.The experiment results showed that the algorithm download the proportion to the number of webpage related to the themes was higher significantly than the best search and breadth first search algorithm and had higher searching efficiency.
出处 《黑龙江八一农垦大学学报》 2012年第4期61-64,共4页 journal of heilongjiang bayi agricultural university
基金 黑龙江省教育厅科学技术研究资助项目(NO.11551015)
关键词 聚焦爬虫 搜索策略 主题相关度 自适应免疫进化算法 focused crawler searching strategy topic relevancy adaptive immune evolutionary algorithm
  • 相关文献

参考文献6

二级参考文献39

  • 1NirwanA EdwinH.用于最优化的计算智能[M].北京:清华大学出版社,1999..
  • 2V .Vapnik. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag, 1995.
  • 3K. R. Muller,S.Mika,G.Rtsch. An Introduction to Kernel- Based Learning Algorithms [J].IEEE Trans.on Neural Networks, 2001, 12(2):181-201,
  • 4LAWRENCE S, GILES C L. Accessibility and distribution of information the Web[ J ]. Nature, 1999,400:107-109.
  • 5BRA P D, HOUDBEN G, KORNATZKY Y, et al. Information retrieval in distributed hypertexts [ C ]//Proc of the 4th RIAO Conference. New York : [ s. n. ], 1994:481-491.
  • 6HERSOVICI M, JACOVI M, MAAREK Y S, et al. The Shark-Search algorithm an application:tailored Web site mapping [ C ]//Proc of the 7th International World Wide Web Conference. 1998:317-326.
  • 7CHO J. Efficient crawling through URL ordering [ J ]. Computer Networks and ISDN Systems, 1998, 30(1-7) : 161 -172.
  • 8PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the web[R]. Stanford:Stanford Digital Library Technologies Project, 1998.
  • 9GANT G, SRINIVASAN P. Topic-driven crawlers : machine learning issues[ C]//Proc of ACM TOIT. 2004.
  • 10EHRIG M, MAEDCHE A. Ontology-focused crawling of Web documents[C]//Proc of the 2003 ACM Symposium on Applied Computing. 2003.

共引文献412

同被引文献6

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部