期刊文献+

基于网页空间进化算法的暴雨灾害主题爬虫策略 被引量:7

Focused Crawler for Rainstorm Disaster Strategy Based on Web Space Evolutionary Algorithm
下载PDF
导出
摘要 针对单目标优化算法求解爬虫问题时难以获得最优加权因子和易于陷入局部最优的缺点,将多目标优化算法引入主题爬虫,提出一种基于多目标优化的网页空间进化算法。通过计算测试链接与种子链接库中链接的最短距离,将其与种子链接库中所有链接间的平均距离进行比较来更新种子链接库。针对多目标优化中Pareto最优解的选取问题,给出一种最近最远候选解法。实验结果表明,与宽度优先搜索等算法相比,该算法具有较高的爬准率和稳定性。 Aiming at the shortcomings of single target optimization algorithm to solve the problem that the crawler problem is difficult to obtain the optimal weighting factor and easy to fall into the local optimum,the multi-objective optimization algorithm is introduced into the topic crawler,and a Web Space Evolution (WSE) algorithm based on multi-objective optimization is proposed.The seed link library is updated by calculating the shortest distance between the test link and the link in the seed link library,comparing it to the average distance of all links in the seed link library.Aiming at the selection of Pareto optimal solution in multi-objective optimization,a recent farthest candidate solution is proposed.Experimental results show that compared with the algorithm of breadth-first search,the algorithm has high tracking rate and stability.
作者 刘景发 李新 蒋盛益 LIU Jingfa;LI Xin;JIANG Shengyi(College of Computer and Software,Nanjing University of Information Science and Technology,Nanjing 210044,China;College of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第2期184-190,共7页 Computer Engineering
基金 国家自然科学基金(61373016) 国家社会科学基金重大招标项目(16ZDA047) 江苏省自然科学基金(BK20171458 BK20181409)
关键词 多目标优化 主题爬虫 网页空间进化算法 PARETO最优 暴雨灾害 multi-objective optimization focused crawler Web Space Evolutionary(WSE) algorithm Pareto optimal rainstorm disaster
  • 相关文献

参考文献4

二级参考文献19

  • 1Ye Shi-ren,Chua Tat-seng,Kan Min-yen,et al.Document concept lattice for text understanding and summarization[J].Information Processing & Management,2007,43(2):1643-1662.
  • 2Luhn H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
  • 3Edmundson H P.New methods in automatic extracting[J].Journal of the ACM (JACM),1969,16(2):264-285.
  • 4Nomoto T,Matsumoto Y.A new approach to unsupervised text summarization[C] ∥Proceedings of ACM SIGIR'01.New York:Idea Group Publishing,2001:26-34.
  • 5Furnas G W,Landauer T K,Gomez L M,et al.The vocabulary problem in human-system communication[J].Communications of the ACM,1987,30(11):964-971.
  • 6Chen Zhi-min,Shen Jie.Research on query-based automatic summarization of webpage[C] ∥Proceedings of Computing,Communication,Control,and Management 2009.Sanya:IEEE CPS,2009:173-176.
  • 7Institute of computing technology chinese academy of sciences.ICTCLAS2009[EB/OL].[2009-04-06].http://ictclas.org/.
  • 8Furnas G W,Landauer T K,Gomez L M,et al.The vocabulary problem in human-system communication[J].Communications of the ACM,1987,30(11)964-971.
  • 9Baxendale E.Machine-made index for technical literature an experiment[J].IBM Journal of Research and Development,1958,12(4):354-361.
  • 10Carbonell J,Goldstein J.The use of MMR,diversity-based reranking for reordering documents and producing summaries[C] ∥Proceedings of SIGIR'98.Melbourne:ACM Press,1998:335-336.

共引文献123

同被引文献97

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部