摘要
文章提出一种基于支持向量机(Support Vector Machine,SVM)的动态网页识别方法,并结合Scrapy开源网络爬虫框架构建了动态网页的网络爬虫,实现了对动态网页的高效识别和内容抓取。以httpbin.org为测试网站,使用SVM模型对静态和动态网页进行分类,随后利用Scrapy框架动态调整抓取策略,验证了该方法的可行性和有效性。
This paper proposes a dynamic web page recognition method based on Support Vector Machine(SVM),and combines it with the Scrapy open source web crawler framework to build a web crawler for dynamic web pages,achieving efficient recognition and content capture of dynamic web pages.This paper uses httpbin.org as the test website,uses the SVM model to classify static and dynamic web pages,and then uses the Scrapy framework to dynamically adjust the crawling strategy to verify the feasibility and effectiveness of this method.
作者
刘君良
栾永明
赵建楠
任川
LIU Junliang;LUAN Yongming;ZHAO Jiannan;REN Chuan(Liaoning Provincial Meteorological Information Center,Shenyang Liaoning 110000,China)
出处
《信息与电脑》
2024年第4期185-187,共3页
Information & Computer