摘要
针对目前传感器在采集数据过程中由于受到天气或者自身设备故障等原因,造成数据缺失或者数据异常,导致不能从采集的数据中获得准确的交通变化规律等问题,分别提出基于改进最近邻插值算法和基于随机森林插补的交通流量数据缺失修复模型。由于交通数据缺失场景和缺失类型以及时空关联的差异性,将数据缺失类型划分为简单随机缺失和复杂连续缺失两种;利用改进的最近邻插值算法建立模型处理简单随机缺失,建立随机森林模型进行迭代插补处理复杂连续缺失;面对两种不同的数据缺失类型,利用期望最大化算法、深度信念网络、季节性差分自回归滑动平均模型分别搭建模型对比交叉验证改进的最近邻插值算法和随机森林插补方法。数据来源于美国加利福尼亚州PeMS(performance measurement system)实时采集的2022年6月1日—2022年7月31日以5 min为采样时间间隔的交通流量数据,为了模拟数据的缺失状况,将完整数据按照一定比例进行缺失,来模拟数据缺数的情况,得到简单随机缺失和复杂连续缺失分布的交通流量缺失数据集。结果表明:本实验在不同的缺失比例下均有良好的表现,通过设计不同的缺失比例和类型,各项评估指标均有明显优势,验证了两种数据缺失填充模型的有效性。
In order to solve the problem that the sensor is missing or abnormal due to weather or equipment failure in the process of data collection,the accurate traffic change law can not be obtained from the collected data.a traffic flow data missing repair model based on improved nearest neighbor interpolation algorithm and random forest interpolation were proposed respectively.Due to the difference of traffic data missing scene,missing type and spatio-temporal correlation,the data missing type was divided into simple random missing and complex continuous missing,and the improved nearest neighbor interpolation algorithm was used to establish a model to deal with simple random missing.A random forest model was established to iteratively interpolate complex continuous deletions.In the face of two different types of data loss,the expectation maximization algorithm,depth belief network and seasonal differential autoregressive moving average model were used to compare the cross-validation improved nearest neighbor interpolation algorithm and random forest interpolation method.The data came from the real-time traffic flow data collected by performance measurement system(PeMS)in California from June 1,2022 to July 31,2022 with 5 min as the sampling interval.In order to simulate the situation of missing data,the complete data was missing according to a certain proportion to simulate the situation of missing data,and the missing data sets of simple random missing and complex continuous missing distribution were obtained.The results show that this experiment has a good performance under different deletion ratios,and each evaluation index has obvious advantages by designing different deletion ratios and types,which verifies the effectiveness of the two data deletion filling models.
作者
汤伟
漆苏应
杨晓东
李国强
TANG Wei;QI Su-ying;YANG Xiao-dong;LI Guo-qiang(College of Electrical and Control Engineering,Shaanxi University of Science&Technology,Xi'an 710021,China;Xi'an Jinlu Traffic Engineering Technology Development Co.,Ltd.,Xi'an 710075,China)
出处
《科学技术与工程》
北大核心
2024年第32期14056-14065,共10页
Science Technology and Engineering
基金
陕西重点研发计划(2022GY-335)。
关键词
智能交通
缺失数据修复
随机森林(RF)
最近邻插值算法
交通运营管理
intelligent transportation
missing data repair
random forest(RF)
nearest neighbor interpolation algorithm
traffic operation management