期刊文献+

基于工程机械设备数据倾斜问题分析与研究

Analysis and Research on Data Skew of Engineering Machinery and Equipment
下载PDF
导出
摘要 数据倾斜是机械设备大数据计算中最常见和最棘手的问题。机械设备数据是多种多样和复杂的,只要数据倾斜,就会有大量的数据计算任务集中在同一个节点或分区中,而其他节点或分区计算任务完成后,数据倾斜节点就会有多余的计算任务,这不仅会增加任务的计算时间,还会增加程序内存的概率。此外,集群资源利用率和集群计算性能可能较低。本文设计搭建了一个基于Spark的工程机械设备监测大数据平台,主要完成工程机械设备传感器数据的存储、清洗和业务统计,除此之外,平台还支持自定义监测网页。将大数据存储和计算模块和Wcb可视化模块整合于一体,让设备管理人员和业务分析人员能够直观有效的管理工程设备和进行业务分析。 Data skew is the most common and intractable problem in big data computing of mechanical equipment.Mechanical equipment data are varied and complex.When the data skew,there will be a large amount of data computing tasks together in the same node or partitions,while other nodes or patition computing tasks are complete,data skew node has a surplus computing tasks,which will not only increase the computation time task,but also can increase the probability of the program memory.In addition,cluster resource utilization and computing performance may be low.In this paper,a big data platform for construction machinery equipment monitoring is designed and built based on Spark,which mainly completes the storage,cleaning and business statistics of sensor data of construction machinery equipment.Moreover,the platform also supports custom monitoring web pages.The big data storage and computing module and Web visualization module are integrated into one so that equipment managers and servjce analysts can intuitively and efectively manage engineering equipment and conduct service analysis.
作者 杨沙沙 黄艳 YANG Shasha;HUANG Yan(School of ZTE,Xi'an Traffic Engineer Institute,Xi'an 710300)
出处 《西安交通工程学院学术研究》 2022年第2期36-40,共5页 Academic Research of Xi'an Traffic Engineering Institute
关键词 数据倾斜 相同节点 分区 性能 data skew same node partitions performance
  • 相关文献

参考文献10

二级参考文献63

  • 1任惠,米增强,赵洪山.基于编码PETRI网的电力系统故障诊断模型研究[J].中国电机工程学报,2005,25(20):44-49. 被引量:39
  • 2赵伟,白晓民,丁剑,方竹,李再华.基于协同式专家系统及多智能体技术的电网故障诊断方法[J].中国电机工程学报,2006,26(20):1-8. 被引量:106
  • 3Codd E F.A relational model for large shared data banks[J].Comm.ACM,1970,13(6):377-387.
  • 4Ghemawat S,Gobioff H,Leung Shun-Tak.The Google File System[J].SIGOPS Operating Systems Review,2003,37(5).
  • 5Chang F,Dean J,Ghemawat S,et al.Bigtable:A DistributedStorage System for Structured Data[C] ∥7th Symposium on Operating Systems Design and Implementation(OSDI 2006).Seat-tle,WA,USA,November 2006:205-218.
  • 6Dean J,Ghemawat S.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2005,51(1):107-113.
  • 7Sylvain G,Le G.Using Cluster Computing to Support Automa-tic and Dynamic Database Clustering[C] ∥Third International Workshop on Automatic Performance Tuning(iWAPT).2008:394-401.
  • 8Guinepain S,Gruenwald L.Automatic Database Clustering U-sing Data Mining[C] ∥Database and Expert Systems Applications,2006(DEXA '06).17th International Conference.2006:124-128.
  • 9Zhong Ke,Dutt S.Effective partition-driven placement with simul-taneous level processing and global net views[C] ∥IEEE/ACM International Conference on Computer Aided Design.Nov.2000:254-259.
  • 10Jean-Daniel C,Alain A,Alain A.Criteria to Compare CloudComputing with Current Database Technology[C] ∥Dumke R,et al.,eds.IWSM / MetriKon/Mensura LNCS 5338.2008:114-126.

共引文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部