摘要
在对网页数据进行爬取时,由于很多网页不是静态的HTML文档,而是包含很多JavaScript程序,使用传统的爬虫方法不能有效地获得所需要的信息,采用Selenium模拟浏览器访问网站的方法以及Python语言对拉勾网中大数据相关岗位数据进行了爬取,并且对大数据开发工程师、大数据研发工程师以及大数据架构师这三个岗位中岗位的任职要求数据进行了分析,用词云进行了展示,可以为数据科学与大数据技术专业培养方案的制定以及相关课程授课学时的设计提供一定的依据。
When crawling the Web data,because many web are not static HTML documents and contain many JavaScript programs,traditional crawling methods cannot be used to obtain the required information effectively.This paper uses Selenium to simulate a browser to access the website Method and uses python language to crawle the data of big data related posts in Lagou.com,and crawles the requirement data for the posts in the three positions of big data development engineer,big data R&D engineer and big data architect.Using the post date to make the analysis and make the demonstration with word cloud,that can provide a certain basis for the formulation of training programs for data science and big data technology majors and the teaching hours of related courses.
出处
《工业控制计算机》
2020年第2期109-111,共3页
Industrial Control Computer