摘要
针对传统数据抓取方法存在抓取成功率小和时延长的问题,提出基于Python爬虫技术的互联网数据抓取方法。首先,定义目标函数,通过计算获得网络数据关键特征;其次,建立爬虫网络相空间格局,获得Python爬虫的维度;最后,运用广度优先法抓取初始数据中的所有信息数据,找到对应的平衡点。实验结果表明,运用该方法的抓取成功率最高,时延最短。
Aiming at the problems of low success rate and prolonged time in traditional data crawling methods,a Python crawler based internet data crawling method is proposed.Firstly,define the objective function and obtain key features of network data through calculation.Secondly,establish the pattern of phase space of the crawler network to obtain the dimensions of Python crawlers.Finally,use the breadth first method to capture all the information data in the initial data and find the corresponding balance point.The experimental results show that using this method has the highest success rate and the shortest delay in grasping.
作者
王芳
WANG Fang(Gandong College,Fuzhou Jiangxi 344000,China)
出处
《信息与电脑》
2023年第7期41-43,共3页
Information & Computer
基金
江西省教育厅科技项目(项目编号:GJJ218604)。