摘要
探地雷达单机处理软件性能较低,难以满足较长里程或周期性检测的海量数据处理场景需求。针对这一问题,本文在集群模式下基于Hadoop平台的MapReduce并行计算框架,采用HDFS和MySQL的混合存储方法,对数据流进行细粒度切片,建立主从节点架构的动态调度模式,实现海量检测数据的负载均衡并行处理。在Linux系统搭建了1主+8从的Hadoop集群环境并对其进行了测试试验。结果表明:动态调度可使迭代算法达到负载均衡,高复杂度算法适用于并行处理,性能可提升100倍左右,加速比接近物理核数。
The performance of GPR single-machine processing software was low,and it was difficult to meet the needs of massive data processing scenarios with long mileage or periodic detection. To solve this problem,this paper used the hybrid storage method of HDFS and MySQL to slice the data stream in a fine-grained manner under the cluster mode based on the Hadoop platform Map Reduce parallel computing framework. The dynamic scheduling mode of master-slave node architecture was established,and the load balancing parallel processing of massive detection data was realized. A Hadoop cluster environment of 1 master with 8 slave was built on Linux system and tested. The results show that the dynamic scheduling can make the iterative algorithm achieve load balance,the high complexity algorithm is suitable for parallel processing,the performance can be improved about 100 times,and the acceleration ratio is close to the number of physical cores.
作者
杜翠
程远水
张千里
DU Cui;CHENG Yuanshui;ZHANG Qianli(Railway Engineering Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China;State Key Laboratory for Track Technology of High-speed Railway,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)
出处
《铁道建筑》
北大核心
2022年第8期140-143,共4页
Railway Engineering
基金
国家能源投资集团有限责任公司科技创新项目(GJNY-20-231)
煤炭资源与安全开采国家重点实验室开放基金(SKLCRSM21KFA06)
中国铁道科学研究院集团有限公司基金(2019YJ060)。
关键词
探地雷达
铁路检测
并行计算
负载均衡
动态调度
集群模式
并行颗粒度
ground penetrating radar
railway detection
parallel computing
load balancing
dynamic scheduling
cluster mode
parallel granularity