主题Web信息采集技术被引量：1

Topic-Specific Web Information Collection Technology

下载PDF

导出

摘要在互联网高速发展的今天,搜索引擎逐渐成为用户在Web上获取信息的主要工具。传统的通用搜索引擎利用一个Crawler程序面向整个Web进行信息采集,它的缺点是采集无针对性、页面失效率高、不能满足特定专业人群的需要。针对这种情况,需要一个分类细致精确、数据全面深入、更新及时的面向主题的搜索引擎。 Search engine has become people＇s main access to gather information on the web. Traditional generic search engine use a program named Crawler to collect information from the whole Web, it has some disadvantages such as non-specific information collection, high rates of pages missing, and can not meet the needs of specific professional groups. What we need is a focused search engine, well classified, containing profound and entire data, and updating in time.

作者杜欢

机构地区重庆邮电大学计算机学院

出处《四川理工学院学报（自然科学版）》 CAS 2007年第5期10-13,共4页 Journal of Sichuan University of Science & Engineering(Natural Science Edition)

关键词搜索引擎 WEB CRAWLER 主题搜索引擎 search engine Web Crawler focused search engine

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1唐志,王成良.遗传算法在主题Web信息采集中的应用研究[J].计算机科学,2006,33(7):71-74. 被引量：5
2Lawrence S,Giles C L.Searching the World Wide[J].Science,1998,280:98-100.
3Lawrence S,Giles C L.Accessibility of information on the web[J].Nature,1999,400(6740):107-109.
4邹海山,吴勇,吴月珠,陈阵.中文搜索引擎中的中文信息处理技术[J].计算机应用研究,2000,17(12):21-24. 被引量：35
5Aggarwal C,Al-Garawi F,Yu P.Intelligent Crawling on the world wide web with Arbitrary Predicates[R].www10 May 1-5,2001,Hong Kong.
6Brin S,Page L.The anatomy of a large-scale hyper-textual Web-search engine[A].Proc 7th International World Wide Web Conference[C].Brisbane:SIGIR,1998,146-164.
7曹红兵.新一代搜索引擎UJIK0[J].图书馆建设,2007(2):48-49. 被引量：2
8Cho J,Garcia-Molina H,Page L.Efficient crawling through URL ordering[J].Computer Networks,1998,30(1-7):161-172.
9Yiming Yang.Noise reduction in a statistical approach to text categorization[A].18th ACM International Conference on Research and Development in Information Retrieval[C].Seattle,Washington,USA,1995,256-263.
10Rennie J,McCallum A.Using reinforcement learning to spider the Web efficiently[A].Proceedings of the International Conference on Machine Learning (ICML 99)[C].1999,335-343.

二级参考文献26

1Menezer F,Pant G, Ruiz M, et al. Evaluating Topic-Driven Web Crawlers [A]. In:Proceedings of 24th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval [C], 2001. 241-249
2Ester M, Grob M, Kriegel H. Focused Web crawling: a generic framework for specifying the user interest and for adaptive crawling strategies[A]. In: Proceedings of 26th International Conference on Very Large Database(VLDB'01)[C], 2001. 527-534
3Eichmann D. Ethical Web Agents. In.. Proceedings of the 2nd International World Wide Web Conference, Chicago, Illinois, USA,1994
4Cho J. Crawling the Web.. Discovery and maintenance of largescale Web data [D]. Department of Computer Science, Stanford University, 2001
5Hersoviei M, Heydon A, Mitzenmaeher M, et al. The sharksearch algorithm -An application: Tailored Web site mapping[A]. In:Proceedings of 7th International World Wide Web Conference [C], 1998. 317-326
6Borodin A,Roberts G O,Rosenthal J S,et al. Finding Authorities and Hubs From Link Struetures on the World Wide Web [A]. In:Proceedings of 10th International world Wide Web Conference,ACM, 2001. 415-419
7Cho J,Gareia-Molina H,Page L. Efficient crawling through URL ordering [J]. Computer Networks, 198,30(1-7) : 161-172
8Rennie J, McCallum A. Using reinforcement learning to spiderthe Web efficiently [A]. In: Proceedings of the International Conference on Machine Learning(ICML 99)[C], 1999. 335-343
9McCallum A, Nigam K, Rennie J, et al. Building Domain-Specific Search Engines with Machine Learning Techniques [A]. AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace [C],1999
10Gibson D, Kleinberg J, Raghavan P. Inferring Web Communities from Link Topology. In: Proc. of the 9th ACM Conference on Hypertext and Hypermedia, Pittsburgh, Pennsylvania, USA, 1998

共引文献39

1冯裕静,赵一美子.基于词频分析的国家级创新创业项目研究方向及趋势研究[J].产业科技创新,2020(6):26-27. 被引量：1
2戴文军,朱立谷,孙志伟,任勇,曾赛峰,郝玮.一种基于搜索引擎的对象存储系统的扩展技术[J].计算机研究与发展,2007,44(z1):126-129.
3赵艳红,费洪晓.一个基于改进的反序分词词典的中文分词算法[J].深圳职业技术学院学报,2004,3(4):28-31. 被引量：2
4张茂元,卢正鼎,邹春燕.一种基于语境的中文分词方法研究[J].小型微型计算机系统,2005,26(1):129-133. 被引量：8
5王坚,赵恒永.专业搜索引擎中文分词算法的实现与研究[J].福建电脑,2005,21(7):55-55. 被引量：3
6王坚,赵恒永.专业搜索引擎的实现与研究——中文分词算法[J].电子科学技术评论,2005(3):77-79. 被引量：4
7曹蓓蓓.信息检索技术的设计探讨[J].河北建筑科技学院学报,2005,22(3):87-89.
8翟凤文,赫枫龄,左万利.基于统计规则的交集型歧义处理方法[J].吉林大学学报（理学版）,2006,44(2):223-228. 被引量：9
9李世明,赵恒永,李世友.专题搜索引擎中信息过滤的研究与实现[J].计算机工程与设计,2006,27(8):1392-1394. 被引量：10
10唐培丽,胡明,解飞,刘钢.全文检索搜索引擎中文信息处理技术研究[J].情报科学,2006,24(6):895-899. 被引量：5

同被引文献38

1汪琳.网络环境下高校图书馆的信息资源建设[J].图书馆理论与实践,2004(4):73-74. 被引量：24
2沈艳.网络环境下高校图书馆的信息资源建设与服务[J].中国图书馆学报,2004,30(5):94-95. 被引量：41
3刘向红,李春旺.图书馆电子文献的采集原则与策略[J].图书馆理论与实践,2005(3):76-78. 被引量：11
4张丽红.论图书馆数字化信息资源采集的策略[J].岭南学刊,2005(5):93-95. 被引量：1
5汪雪莲.试论图书馆在数字出版产业链中的地位和作用[J].图书馆杂志,2005,24(10):16-19. 被引量：16
6吴信岚,蒋新.高校图书馆文献信息资源载体类型的选择和配置[J].现代情报,2005,25(10):45-47. 被引量：4
7彭跃宏.地方高校图书馆特色馆藏建设的必要性和基本途径[J].前沿,2005(11):265-266. 被引量：34
8吴冬曼,李平.复合图书馆文献资源建设策略[J].图书情报工作,2005,49(11):58-62. 被引量：24
9张秀芝,纪晓平,毛春辉.网络学术信息资源的采集和组织[J].情报科学,2005,23(12):1831-1834. 被引量：17
10黄国彬,孙坦.建设复合图书馆的技术支撑体系[J].图书馆理论与实践,2005(6):8-11. 被引量：9

引证文献1

1何坚石.数字出版环境下的信息资源采集研究现状与展望[J].江西图书馆学刊,2010,40(3):19-22. 被引量：5

二级引证文献5

1庞天丙.数据挖掘技术在图书馆系统中的研究现状分析与研究[J].福建电脑,2012,28(2):72-74. 被引量：1
2葛冬冬.探讨数字环境下出版传播的变迁[J].新闻研究导刊,2015,6(10):182-182. 被引量：2
3徐晔.图书馆数据挖掘技术研究现状述评[J].办公室业务,2013(10S):128-128.
4李宗闻.数字环境下出版传播的变迁研究[J].科技传播,2014,6(16):44-44. 被引量：2
5陈丽霞.数字环境下出版传播的变迁探讨[J].科技传播,2015,7(10). 被引量：1

1肖诗源,叶俊,刘贤德.一种基于Agent的分布式搜索引擎[J].计算机工程,2002,28(7):38-39. 被引量：13
2黄晓英.网络信息资源搜索引擎利用的方法与技巧[J].图书与情报,2002(2):53-55. 被引量：16
3吴根斌,丁振凡.基于语义Web的搜索引擎研究[J].计算机与现代化,2012(8):129-133. 被引量：3
4卢仁猛.检索结果聚类算法研究综述[J].计算机光盘软件与应用,2014,17(18):109-110.
5蔡恩泽.农业搜索引擎：农村信息化的“新耳目”[J].互联网天地,2008(1):75-75.
6董薇.基于元数据的面向高校学术数据库的搜索引擎设计[J].企业技术开发（下半月）,2010,29(12):96-96.
7刘琨,郑有才.搜索引擎剖析[J].微机发展,2004,14(3):19-22. 被引量：11
8崔然.威客网站系统在互联网中的应用[J].中国城市经济,2011(11X):146-147.
9崔维,陈闳中.用技术手段解决“初始混淆”——一个法律问题引出的技术问题[J].计算机工程与应用,2003,39(1):167-169. 被引量：2
10余静,刘万军.基于网页分块的主题爬虫研究[J].计算机与信息技术,2008(10):83-84. 被引量：1

四川理工学院学报（自然科学版）

2007年第5期

浏览历史

内容加载中请稍等...

主题Web信息采集技术被引量：1

参考文献12

二级参考文献26

共引文献39

同被引文献38

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

主题Web信息采集技术 被引量：1

参考文献12

二级参考文献26

共引文献39

同被引文献38

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

主题Web信息采集技术被引量：1