摘要
数字资源凭借获取较为便捷的特点已成为文献查找和资料研究的重要信息获取来源,但是数字资源的获取受网络状态的限制,为了保障数字资源的正常工作,设计数字资源检测软件,提出基于网络爬虫的数字资源检测软件设计。构建数字资源检测软件总体框架,在总框架的基础上进行网络爬虫技术软件设计。设计信息采集模块,保证网络爬虫可以复制所有的数字资源,在此基础上进行可视化信息抽取,保障在需要信息时可以随时调用,通过爬虫技术实现数字资源的最终检测。在相同的硬件环境中,利用基于网络爬虫的数字资源检测软件和传统的人工检测方法、自研检测软件进行信息抓取实验,实验结果证明基于网络爬虫的数字资源检测软件单位时间内的信息抓取量最高。
The convenience of digital resource access makes it an important source of information access for literature search and data research.However,because the access of digital resources is limited by the network status,in order to ensure the normal work of digital resources,the digital resources detection software is designed,and the network crawler-based digital resources detection software design is proposed.Establish the overall framework of digital resource detection software,and design the software of web crawler technology on the basis of the overall framework,design the information collection module to ensure that the web crawler can copy all digital resources.On this basis,carry out visual information extraction to ensure that it can be called at any time when information is needed,and realize the final detection of digital resources through crawler technology.In the same hardware environment,the information grabbing experiment is carried out by using the digital resource detection software based on web crawler,traditional manual detection method and self-developed detection software.The experimental results show that the information grabbing amount of the digital resource detection software of web crawler is the highest in unit time.
作者
田宇浩
TIAN Yuhao(School of Computer Science and Engineering,Xi'an University of Technology,Xi'an Shaanxi 710048,China)
出处
《信息与电脑》
2021年第17期124-126,共3页
Information & Computer
关键词
网络爬虫
数字资源
软件开发
检测软件
信息采集
信息抽取
web crawler
digital resource
software development
detection software
information collection
information extraction