期刊文献+

自适应最优搜索算法的网络蜘蛛的设计与实现 被引量:1

Design and implementation of adaptive best-first Web spider
下载PDF
导出
摘要 主题搜索引擎NonHogSearch改进了采用最优搜索算法的网络蜘蛛的搜索过程,控制了搜索的贪婪程度;并引入网页信噪比概念,从而判断网页是否属于所要搜索的主题页面;进一步,NonHogSearch在爬行过程中自动更新链接的权重,当得到主题相关页面时产生回报,将回报沿链接链路逆向反馈,更新链路上所有链接的Q值,这样避免了网络蜘蛛过早陷入Web搜索空间中局部最优子空间的陷阱,并通过并行方式实现多条链路的同时搜索,改进了搜索引擎的性能。实验证实了该算法在查全率与查准率两方面都有一定的优越性。 NonHogSeareh, a topic-specific search engine based on improved best-first search algorithm was designed and implemented, which decreased the searching greed degree. Signal-to-noise ratio of Web page was used to judge whether or not the page belonged to the search topic. Further NonHagSearch Web spider made online-lncremental adaptive learning, the reward generated directly by the on-topic pages would be feedback along the link-chaln to update all the value Q of the links. NonHogSearch avoids going into local best solutions space earlier, and the performance of Web spider was improved. Experiments prove that it has better recall rate and precision rate than others.
出处 《计算机应用》 CSCD 北大核心 2007年第11期2857-2859,共3页 journal of Computer Applications
基金 广东省自然科学基金资助项目(06025383)
关键词 个性化网络蜘蛛 最优搜索算法 在线增量自学习 网页信噪比 topic-specific Web spider best-first search algorithm online-incremental adaptive learning signal-to-noise ratio of Web page
  • 相关文献

参考文献6

二级参考文献10

  • 1McCallum A, Nigam K, Rennie J, et al. Building domain-specific search engine with machine learning techniques [A]. AAAI Spring Symposium on Intelligent Agents in Cyberspace, Stanford University,USA,1999.
  • 2Chakrabarti S M, van den Berg H, Dom B. Focused crawling: a new approach to topic-specific Web resource discovery [J]. Computer Networks,1999,31(11-16):1 623-1 640.
  • 3Diligenti M, Coetzee F M, Lawrence S, et al. Focused crawling using context graphs [A]. 26th International Conference on Very Large Database, Cairo,Egypt, 2000.
  • 4Chakrabarti S, Kunal P, Mellela S. Accelerated focused crawling through online relevance feedback [A]. The Eleventh International Conference on World Wide Web, Hawaii,USA,2002.
  • 5Nigam K. Using unlabeled data to improve text classification [D]. Pittsburgh, USA: School of Computer Science, Carnegie Mellon University, 2001.
  • 6Jing Peng, Williams R. Incremental multi-step Q-learning [J]. Machine Learning,1996,22(1-3):283-290.
  • 7Wiering M, Schmidhuber J. Fast online Q(λ)[J]. Machine Learning,1998,33(1):105-115.
  • 8杨湘龙,王飞,冯允成.仿真优化理论与方法综述[J].计算机仿真,2000,17(5):1-5. 被引量:35
  • 9宫秀军,史忠植.基于Bayes潜在语义模型的半监督Web挖掘[J].软件学报,2002,13(8):1508-1514. 被引量:28
  • 10雷鸣,王建勇,陈葆珏,李晓明.Improved Relevance Ranking in WebGather[J].Journal of Computer Science & Technology,2001,16(5):410-417. 被引量:4

共引文献34

同被引文献7

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部