期刊文献+

主题蜘蛛的设计与实现 被引量:3

The Design and Implementation of Focused-spider
下载PDF
导出
摘要 针对多媒体资源在网上的分布特点,采用链接类型过滤、网页内容过滤、链接内容过滤三层过滤和临时页面存储、目标页面存储、中间链接存储、更新存储四层存储机制,设计并实现了一个对包含多媒体资源(音频、视频和Flash动画)的网页进行搜集的主题蜘蛛.实验结果显示,该主题蜘蛛能有效提高查准率. According to the distributed characteristics of multimedia resources, employing the way of three-filter that is link type filter, page content filter, link content filter and four-store that is temporary page store, targeted-page store, middle link store, update store, a focused-spider which is applied to collect the Web pages that contain the multimedia resources (including audio, video and Flash), is designed and implemented. The experiment results show that this focused-spider can raise the precision ratio greatly.
作者 宋宇 孟祥增
出处 《郑州大学学报(理学版)》 CAS 2007年第2期42-45,49,共5页 Journal of Zhengzhou University:Natural Science Edition
基金 山东省自然科学基金资助项目 编号y2005G21
关键词 主题蜘蛛 链接过滤 内容过滤 focused-spider link filter content filter
  • 相关文献

参考文献8

  • 1Menczer F.Complementing search engines with online Web mining agents[J].Decision Support Systems,2003,35(2):195-212.
  • 2欧阳柳波,李学勇,李国徽,王鑫.专业搜索引擎搜索策略综述[J].计算机工程,2004,30(13):32-33. 被引量:34
  • 3Bra D P,Houben G,Kornatzky Y,et al.Information retrieval in distributed hypertexts[C]//Proceedings of the 4th RIAO Conference.New York,1994:481-491.
  • 4Cho J,Garcia-Molina H,Page L.Efficient crawling through URL ordering[J].Computer Networks,1998,30(1):161-172.
  • 5曾春,邢春晓,周立柱.基于内容过滤的个性化搜索算法[J].软件学报,2003,14(5):999-1004. 被引量:118
  • 6Bharat K,Henznger M R.Improved algorithms for topic distillation in a hyperlinked environment[C]//Proceedings of SIGIR Conference on Research and Development in Information Retrieval.New York,1998:104-111.
  • 7Rennie J,McCallum A.Using reinforcement learning to spider the Web efficiently[C]//Proceedings of the International Conference on Machine Learning(ICML 99).Bled,Slovenia,1999:335-343.
  • 8Diligenti M,Coetzee F M,Lawrence S,et al.Focused crawling using context graphs[C]//Proceedings of the International Conference on Very Large Database(VLDB00).Cairo,Egypt,2000:527-534.

二级参考文献11

  • 1Menczer F. Complementing Search Engines with Online Web Mining Agents[J]. Decision Support Systems, 2003, 35(2): 195-212
  • 2Bra D P, Houben G, Kornatzky et al. Information Retrieval in Distributed Hypertexts[C]. In: Proc. of the 4th RIAO Conference,1994
  • 3Hersovici M, Heydon A, Mitzenmacher M, et al. The Shark-search Algorithm-An Application: Tailored Web Site Mapping[C]. In: Proc.of the World-Wide Web Conference, 1998
  • 4Cho J, Garcia-Molina H, Page L. Efficient Crawling Through URL Ordering[J]. Computer Networks, 1998, 30(1-7): 161- 172
  • 5Rennie J, McCallum A. Using Reinforcement Learning to Spider the Web Efficiently[C]. In: Proc. of the International Conference on Machine Learning(ICML 99), 1999
  • 6Diligenti M, Coetzee F M, Lawrence S, et al. Focused Crawling Using Context graphs[C]. In: Proc. of the International Conference on Very Large Database(VLDB00), 2000
  • 7Bharat K, Henznger. Improved Algorithms for Topic Distillation in A Hyperlinked Environment[C]. In: Proc. of SIGIR Conference on Research and Development in Information Retrieval, 1998
  • 8Aggarwal C, Al-Garawi F, Yu S P. Intelligent Crawling on the World Wide Web with Arbitrary Predicates[C]. In: Proc. of the 10th International World Wide Web Conference, 2001
  • 9Ester M, Grob M, Kriegel H. Focused Web Crawling: A Generic Framwork for Specifying the User Interest and for Adaptive Crawling Stratrgies[C]. In: Proc. of the International Conference on Very Large Database(VLDB01 ), 2001
  • 10Chakrabarti S, Punera K, Subramanyam M. Accelerated Focused Crawling Through Online Relevance Feedback[C]. In: Proc. of the llh International World Wide Web Conference, 2001

共引文献147

同被引文献12

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部