期刊文献+

基于语义分析的主题信息采集系统的设计与实现 被引量:14

Design and implementation of focused Web crawler based on semantic analysis
下载PDF
导出
摘要 设计并实现了一个基于语义分析的主题信息采集系统(SAFWC),提出一种链接价值预测算法(SPageRank)。该算法从语义的角度出发,结合“知网”,通过对扩展元数据进行主题相关性判定来选择、预测与主题相关的URL。实验结果表明,该系统具有较高的采集效率及精度。 The design and implementation of a Semantic Analysis Focused Web Crawler (SAFWC) was introduced. In combination with HowNet, extended metadata semantic relevance algorithm for predicting the relativity between URL and top ie was applied. The result of experiments has shown that SAFC has higher efficiency and accuracy for Web pages relevant to a predefined set of topics.
出处 《计算机应用》 CSCD 北大核心 2007年第2期406-408,共3页 journal of Computer Applications
关键词 主题信息采集 知网 扩展元数据 搜索策略 focused Web crawler howNet extended metadata crawling strategy
  • 相关文献

参考文献8

二级参考文献28

  • 1倪文杰,张卫国,冀小军.现代汉语辞海[M].北京:人民中国出版社,1994.6.
  • 2Aggarwal C, AI-Garawi F, Yu P. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. In Proceedings of the 10th International WWW Conference,2001.
  • 3Brin S, Page L, Tile Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International World Wide Web Conference, 1998.
  • 4Diligenti M, Coetzee F M, Lawrence S, et al. Gori Focused Crawling Using Context Graphs. VLDB Conference, 2000.
  • 5Menczer F, Srinivasan G P P, Ruiz M. Evaluating Topic-driven Web Crawlers. In Proceedings of the 24th Annual International ACM/SIGIR Conference,2001.
  • 6Nancy Ide,Jean Véronis.Introduction to the Special Issue on Word Sense Disambiguation:The State of the Art[J].Computational Linguistics,1998,24(1):1-40.
  • 7董振东 董强.[DB/CD].《知网》[DB/OL].http://www.keenage.com,1999.
  • 8David Yarowsky.One Sense Per Collocation[C].Proc.of ARPA Human Language Technology Workshop,Princeton,1993.266-271.
  • 9Wang Chi-Yung.Knowledge-based Sense Pruning Using the HowNet:An Alternative to Word Sense Disambiguation[D].Hong Kong:Hong Kong University,2002.
  • 10S Brin, L Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine[ J]. Computer Networks and ISDN Systems, 1998,30(1) :107-117.

共引文献211

同被引文献150

引证文献14

二级引证文献72

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部