摘要
针对单一价值评价的聚焦爬虫搜索策略存在主题漂移等问题进行了研究,充分利用量子进化算法所具有的智能性,提出一种新的聚焦爬虫爬行算法。该算法充分结合网页在互联网上的分布特点,利用立即价值和未来价值两类评价标准的优势,根据聚焦爬虫实际运行过程中的搜索情况,在线调整这两种标准在综合价值中的比重。实验仿真结果表明,相对于单一价值的搜索策略,量子进化算法获得较高的页面查全率和信息查准率,能较好地解决现存问题,具有一定的自适应性。
According to the single value evaluation focused crawler search strategy has the topic drift problem, and make full use of the intelligence of the Bloch quantum evolutionary algorithm( BQEA ) , this paper proposed a new algorithm of focused crawler. The algorithm integrated Web distribution on the Internet fully, used the advantages of two types of evaluation criteria of the immediate value and the future value adjusted to the proportion of two standards online in the integrated Value, according to focused crawler search on the actual process. The experimental result by simulation show that, compared with the search strategy of a single value, the BQEA obtains a higher recall rate, and precision rate and can solve the existing problems with certain self-adaptive,
出处
《计算机应用研究》
CSCD
北大核心
2012年第11期4280-4283,共4页
Application Research of Computers
基金
中国博士后科学基金资助项目(20090460864)
黑龙江省教育厅科学技术研究资助项目(11551015)
关键词
聚焦爬虫
主题相关度
立即价值
未来价值
量子进化算法
focused crawler
topic relevancy
immediate value
future value
Bloch quantum evolutionary algorithm