摘要
在轨道交通领域中,车辆智能运维系统运用了各种大数据技术对车辆状态数据进行实时计算分析,从而实现状态监控和故障预警,指导关键设备现场维修作业,有效地提高了车辆运维人员的工作效率。然而,由于车地网络通信环境的不稳定,会导致某些单车下发的车载数据出现数据洪峰,而使用通用配置的大数据流数据处理框架处理此类场景时会出现性能瓶颈。文章针对产生该性能瓶颈的原因进行了深入分析,提出了基于自定义Kafka分区规则和使Spark Streaming处理框架参数最优化的方法,其将不同车辆的数据写入不同的Kafka分区中,并且控制Spark计算节点从Kafka读取数据的速率。实际项目应用结果表明,该方法能有效解决数据洪峰场景下的流数据处理速率跟不上数据读取速率而导致的性能瓶颈问题。
In the field of rail transit,vehicle intelligent operation and maintenance system uses various big data technologies to realize real-time calculation and analysis of vehicle status data.With status monitoring and fault warning,the system guides the maintenance of key equipments,and effectively improves the work efficiency of vehicle operation and maintenance personnel.However,data flood occurs due to the instability of network communication environment between vehicle and ground system,resulting in performance bottleneck when big data computing framework uses the general configuration.This paper makes an indepth analysis on the causes of the performance bottleneck,and proposes an optimization method based on the user-defined Kafka partition strategy and the optimization of Spark Streaming processing parameters.Data of different vehicles is written to different Kafka partitions,and the rate at which the Spark Executor reads data from Kafka is controlled.Actual project application results show that this method can effectively solve the problem of performance bottleneck when the processing rate of streaming data cannot keep up with the reading rate in the data flood scenario.
作者
汤鹏飞
胡卫民
杨永滔
TANG Pengfei;HU Weimin;YANG Yongtao(Zhuzhou CRRC Times Electric Co.,Ltd.,Zhuzhou,Hunan 412001,China)
出处
《控制与信息技术》
2022年第6期91-98,共8页
CONTROL AND INFORMATION TECHNOLOGY
关键词
大数据
流式数据处理
性能优化
数据洪峰
智能运维
轨道交通
big data
streaming data processing
performance optimization
data flood
intelligent operation and maintenance
rail transit