期刊文献+

基于分布式Docker群集的招聘网站职位数据持续爬取和分析 被引量:2

Continuous Crawling and Analysis of Job Data in Recruitment Website Based on Distributed Docker Cluster
下载PDF
导出
摘要 许多大型在线求职平台由于招聘信息不统一,求职者难以在丰富冗余的求职信息中找到合适精准的信息。利用网页爬虫技术,设计实现了分布式Docker容器群集架构下招聘网站的职位数据持续爬取和分析展示。首先,利用Swarm容器管理工具构建多台物理主机的Docker群集;然后,利用Python的Scarpy框架对主流招聘网站的非结构化职位信息进行持续性分布式网络爬虫,涉及URL地址去重、数据采集、提取和清洗等,产生招聘职位的MYSQL数据库;最后,对求职数据库进行分析挖掘,生成职位数量分布热力图、岗位技能画像和可视化展示的统计图表,可为求职者提供直观的职位信息参考。 Many large online job-hunting platforms have inconsistent recruitment information, so it is difficult for job seekers to find appropriate and accurate information in the rich and redundant job-hunting information. This paper designs and implements the continuous crawling and analytical display of job data for recruitment websites based on distributed Docker container cluster by using web crawler technology. Firstly, Swarm container management tools are used to build Docker clusters of multiple physical hosts;then, we use Python’s Scarpy framework to perform a continuous distributed web crawler for unstructured job information of recruitment websites, which involves URL duplication, data collection, extraction and cleaning, to generate the MYSQL database of recruitment position;finally, we analyze and mine the job database, to generate statistical charts of job number distribution thermodynamic diagram, job skill portrait and visual display, which can provide the intuitive information reference for job seekers.
作者 张梁斌 柴晖 王渊明 万健 ZHANG Liang-bin;CHAI Hui;WANG Yuan-ming;WAN Jian(Zhejiang Wanli University, Ningbo Zhejiang 315100;Hangzhou Dianzi University, Hangzhou Zhejiang 310018)
出处 《浙江万里学院学报》 2019年第2期85-90,共6页 Journal of Zhejiang Wanli University
基金 2018年度高校访问学者“教师专业发展项目”(FX2018050) 2018年浙江省大学生科技创新项目暨新苗人才计划(2018R420016)。
关键词 招聘职位 网页爬虫 Docker群集 SWARM 可视化 job recruitment Web crawler Docker cluster Swarm visualization
  • 相关文献

参考文献5

二级参考文献36

共引文献180

同被引文献19

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部