摘要
为提高大数据环境下主题信息搜索的准确率和查全率,提出了将贝叶斯推理与遗传算法相结合的搜索策略.利用贝叶斯推理对文档的主题相关度进行了计算,并结合遗传算法对搜索过程进行启发式引导,同时引入差异度参数,在Heritrix框架基础上,利用集成开发环境Eclipse 3.3实现了相应功能.实验结果表明:搜索策略改进后的系统抓取主题页面所占比例与原系统相比有较大提高.
To improve the precision rate and recall rate of topic search in big data environment, this paper proposes a searching strategy based on Bayesian reasoning and genetic algorithm. It distinguishes the correlation between the web pages by Bayesian reasoning, and by genetic algorithm inspired pilot the searching process of and introduces the parameter of differentia. Based on Heritrix, the functions were implemented in the integrated development environment of Eclipse 3.3. The experimental results show that the new strategy improves the proportion between the topic page number and the total number of pages.
出处
《中南民族大学学报(自然科学版)》
CAS
2014年第2期89-92,共4页
Journal of South-Central University for Nationalities:Natural Science Edition
基金
中央高校基本科研业务费专项资金资助项目(ZZQ10011)
关键词
搜索引擎
搜索策略
贝叶斯推理
遗传算法
search engine
searching strategy
bayesian reasoning
genetic algorithm