一种图像主题网络爬虫的实现方法研究被引量：2

Design and Implementation of a Web Crawler for Images

下载PDF

导出

摘要针对一种图像主题爬虫进行了设计研究,采用了基于文字内容的启发式方法,实现了借助图像文件的锚文本及其上下文进行主题相关性判定,能更准确的抓取相关图像资源.还对网页实现了主题相关性判定,以便更有效地引导爬虫的爬行路经.经实验证明,本系统可起到一定的优化效果,为实现定向主题的图像信息采集奠定了良好的基础. An approach of a web crawler for images is designed and implemented in this paper. An elicitation method based on text content is adopted, and the determination of topic correlation is realized with the help of the anchor text of image files and their contexts, to snatch at resources of relevant images more accurately. The paper also carries out the determination of topic correlation of images so as to pilot more effectively the crawling path of the crawlers. Experiments prove that the system has a certain effect of optimization, and lays a good foundation of realizing the collection of image information of directional topics.

作者朱学芳韩占校

机构地区南京大学信息管理系

出处《南京师范大学学报（工程技术版）》 CAS 2008年第4期115-117,166,共4页 Journal of Nanjing Normal University(Engineering and Technology Edition)

关键词链接锚文本链接上下文网络爬虫 JXTA 主题爬虫 anchor text, link-content Web crawler, JXTA, topical crawler

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1[1]De Bra P,Houben G,Kornatzky Y,et al.Information retrieval in distributed hypertexts[C]//Proc of the 4th RIAO Conference.New York,1994:481-491.
2[3]Chakrabarti S,Punera K,Subramanyam M.Accelerated focused crawling through online relevance feedback[C].Proc of the 11 th International World Wide Web Conference.Hawaii:[s.n.],2002.
3[5]Brin S,Page L.The anatomy of a large-scale hypertextual Web search Engine[C].Proc the 7th World Wide Web Conference,[s.n.],1998:146-164.
4[6]Lucene[EB/OL].http://lucene.apache.org/,2008.7.21.

同被引文献17

1周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量：156
2顾金睿,王芳.关于本体论的研究综述[J].情报科学,2007,25(6):949-956. 被引量：19
3Takahashi T,Soonsang H,Taura K,et all.World Wide Web Crawler[OL].[2009-05-09].http://www.2002.org/ CDROM/poster/182/.
4Shkapenyuk V,Suel T.Design and implementation of a high-performance distributed Web crawler[C] ∥Proceedings of the 18th International Conference on Data Engineering,April,2002:357-368.
5Sing L.JXTA 2:A High-Performance,Massively Scalable P2P Network[OL].[2009-05-09].http:// www.ibm.com/developerworks/java/library/j-jxta2/.
6De Bra P,Houben G,Kornatzky Y,et al.Information retrieval in distributed hypertexts[C] ∥Proceedings of the 4th RIAO Conference.New York,1994:481-493.
7Chakrabarti S,Punera K,Subramanyam M.Accelerated focused crawling through online relevance feed-back[C ] ∥Proceedings of the 11th International World Wide Web Conference.Hawaii,2002:148-159.
8Jakarta Common HttpClient[OL].[2008-03-01].http://hc.apache.org/httpclient-3.x/.
9Najork M,Heydon A.High-Performance Web Crawling,COMPAQ System Research Center(SRC),Research Report[R].Kluwer Academic Publishers,September,2001.
10Dnsjava[OL].[2008-03-24].http://www.dnsjava.org/.

引证文献2

1朱学芳,韩占校.基于P2P的分布式主题爬虫系统的设计与实现[J].情报学报,2010,29(3):402-407. 被引量：6
2秦学勇.基于互联网资源的学科Ontology构建研究[J].廊坊师范学院学报（自然科学版）,2011,11(2):21-23.

二级引证文献6

1朱学芳,冯曦曦.面向农业主题搜索引擎设计与实现[J].安徽农业科学,2011,39(35):22183-22186. 被引量：1
2朱学芳,冯曦曦.基于文本内容的农业网页信息抽取和分类研究[J].情报科学,2012,30(7):1012-1015. 被引量：3
3胡新明,夏火松.在线评论中用户商品属性偏好识别方法研究[J].情报杂志,2012,31(9):197-201. 被引量：5
4林钦.一种基于网页剪辑的信息推荐系统[J].鲁东大学学报（自然科学版）,2012,28(4):319-321. 被引量：4
5黄炜,金雅博,胡昌龙.网络舆情主题信息采集研究[J].现代图书情报技术,2012(11):65-71. 被引量：10
6刘启华.基于LDA和领域本体的竞争情报采集研究[J].情报科学,2013,31(4):51-55. 被引量：4

1朴辛辛.使用Photoshop统一背景色调强化图像主题[J].电子乐园,2011(6):89-89.
2袁志祥,张飞,鲍威,孙国华,刘明.基于Nutch的节能减排垂直搜索引擎设计与实现[J].计算机工程与设计,2016,37(9):2565-2570. 被引量：1
3张兆中.基于HTML标记信息的主题相关性判定方法[J].淮阴师范学院学报（自然科学版）,2005,4(3):240-243. 被引量：1
4林金山,邵立康.基于串行通信的图像信息采集系统设计[J].微计算机信息,2006(08S):111-112. 被引量：1
5左仁祥.WEB页上信息收集与查询[J].江西广播电视大学学报,2000(4):66-70.
6GAOGAO.不可思议,远程控制用网页[J].电脑应用文萃,2005(4):91-91.
7平淡.网页开会喽——在单一网页实现多标签浏览[J].电脑爱好者,2009(3):39-39.
8毛贵明.通过网页实现文件上传[J].电脑爱好者,2002(17):93-93.
9李贺,苗淑娟.客户管理系统的分析与设计[J].情报科学,2002,20(7):734-736. 被引量：1
10王燕妮.基于DIV+CSS的新闻管理系统网页实现[J].电脑编程技巧与维护,2014(16):36-37.

南京师范大学学报（工程技术版）

2008年第4期

浏览历史

内容加载中请稍等...

一种图像主题网络爬虫的实现方法研究被引量：2

参考文献4

同被引文献17

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

一种图像主题网络爬虫的实现方法研究 被引量：2

参考文献4

同被引文献17

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

一种图像主题网络爬虫的实现方法研究被引量：2