摘要
针对一种图像主题爬虫进行了设计研究,采用了基于文字内容的启发式方法,实现了借助图像文件的锚文本及其上下文进行主题相关性判定,能更准确的抓取相关图像资源.还对网页实现了主题相关性判定,以便更有效地引导爬虫的爬行路经.经实验证明,本系统可起到一定的优化效果,为实现定向主题的图像信息采集奠定了良好的基础.
An approach of a web crawler for images is designed and implemented in this paper. An elicitation method based on text content is adopted, and the determination of topic correlation is realized with the help of the anchor text of image files and their contexts, to snatch at resources of relevant images more accurately. The paper also carries out the determination of topic correlation of images so as to pilot more effectively the crawling path of the crawlers. Experiments prove that the system has a certain effect of optimization, and lays a good foundation of realizing the collection of image information of directional topics.
出处
《南京师范大学学报(工程技术版)》
CAS
2008年第4期115-117,166,共4页
Journal of Nanjing Normal University(Engineering and Technology Edition)