摘要
主题搜索引擎NonHogSearch改进了采用最优搜索算法的网络蜘蛛的搜索过程,控制了搜索的贪婪程度;并引入网页信噪比概念,从而判断网页是否属于所要搜索的主题页面;进一步,NonHogSearch在爬行过程中自动更新链接的权重,当得到主题相关页面时产生回报,将回报沿链接链路逆向反馈,更新链路上所有链接的Q值,这样避免了网络蜘蛛过早陷入Web搜索空间中局部最优子空间的陷阱,并通过并行方式实现多条链路的同时搜索,改进了搜索引擎的性能。实验证实了该算法在查全率与查准率两方面都有一定的优越性。
NonHogSeareh, a topic-specific search engine based on improved best-first search algorithm was designed and implemented, which decreased the searching greed degree. Signal-to-noise ratio of Web page was used to judge whether or not the page belonged to the search topic. Further NonHagSearch Web spider made online-lncremental adaptive learning, the reward generated directly by the on-topic pages would be feedback along the link-chaln to update all the value Q of the links. NonHogSearch avoids going into local best solutions space earlier, and the performance of Web spider was improved. Experiments prove that it has better recall rate and precision rate than others.
出处
《计算机应用》
CSCD
北大核心
2007年第11期2857-2859,共3页
journal of Computer Applications
基金
广东省自然科学基金资助项目(06025383)
关键词
个性化网络蜘蛛
最优搜索算法
在线增量自学习
网页信噪比
topic-specific Web spider
best-first search algorithm
online-incremental adaptive learning
signal-to-noise ratio of Web page