摘要
数据倾斜是机械设备大数据计算中最常见和最棘手的问题。机械设备数据是多种多样和复杂的,只要数据倾斜,就会有大量的数据计算任务集中在同一个节点或分区中,而其他节点或分区计算任务完成后,数据倾斜节点就会有多余的计算任务,这不仅会增加任务的计算时间,还会增加程序内存的概率。此外,集群资源利用率和集群计算性能可能较低。本文设计搭建了一个基于Spark的工程机械设备监测大数据平台,主要完成工程机械设备传感器数据的存储、清洗和业务统计,除此之外,平台还支持自定义监测网页。将大数据存储和计算模块和Wcb可视化模块整合于一体,让设备管理人员和业务分析人员能够直观有效的管理工程设备和进行业务分析。
Data skew is the most common and intractable problem in big data computing of mechanical equipment.Mechanical equipment data are varied and complex.When the data skew,there will be a large amount of data computing tasks together in the same node or partitions,while other nodes or patition computing tasks are complete,data skew node has a surplus computing tasks,which will not only increase the computation time task,but also can increase the probability of the program memory.In addition,cluster resource utilization and computing performance may be low.In this paper,a big data platform for construction machinery equipment monitoring is designed and built based on Spark,which mainly completes the storage,cleaning and business statistics of sensor data of construction machinery equipment.Moreover,the platform also supports custom monitoring web pages.The big data storage and computing module and Web visualization module are integrated into one so that equipment managers and servjce analysts can intuitively and efectively manage engineering equipment and conduct service analysis.
作者
杨沙沙
黄艳
YANG Shasha;HUANG Yan(School of ZTE,Xi'an Traffic Engineer Institute,Xi'an 710300)
出处
《西安交通工程学院学术研究》
2022年第2期36-40,共5页
Academic Research of Xi'an Traffic Engineering Institute
关键词
数据倾斜
相同节点
分区
性能
data skew
same node
partitions
performance