期刊文献+

面向城市大数据的分布式并行获取方法研究 被引量:1

Research on Distributed Parallel Acquisition Method for Urban Big Data
下载PDF
导出
摘要 通过采取高效的分布式网络数据获取方法,结合大数据分析与人工智能技术,能够为传统行业建设与管理提供更加科学精准的分析与预测手段。以江苏省电力建设的投资成本预测为背景,基于主流python语言和分布式爬虫框架scrapy研究深层网络爬虫,根据深层网络结构设计爬虫策略并实现并行网络数据抓取系统,大规模获取江苏省各地市的GDP、人口数量、企业分类、社区建设、交通建设等宏观经济数据。通过自然语言处理和正则表达式等技术,对获取到的结构化和非结构化数据进行数据清洗和文本处理,最终实现数据的可视化展示。 By adopting efficient distributed network data acquisition method,combining big data analysis and artificial intelligence technology,it can provide more scientific and accurate analysis and prediction means for the construction and management of traditional industries.In this paper,based on the prediction of investment cost of electric power construction in Jiangsu Province,the deep web crawler is studied based on mainstream python language and distributed crawler framework scrapy,and the crawler strategy is designed according to the deep network structure,and a parallel network data capture system is implemented,so as to obtain the macroeconomic data of GDP,population,enterprise classification,community construction,traffic construction and other cities in Jiangsu Province on a large scale.Through natural language processing and regular expression technology,data cleaning and text processing are carried out on the obtained structured and unstructured data,and finally the visual display of data is realized.
作者 张震宇 王婷 任腾云 赵琳 王纪军 ZHANG Zhen-yu;WANG Ting;REN Teng-yun;ZHAO Lin;WANG Ji-jun(Jiangsu Electric Power Information Technology Co.,Ltd.,Nanjing 215000 China)
出处 《自动化技术与应用》 2023年第7期119-122,共4页 Techniques of Automation and Applications
关键词 分布式计算 大数据 爬虫框架 投资成本 distributed computing big data crawler framework investment cost
  • 相关文献

参考文献8

二级参考文献26

共引文献26

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部