期刊文献+

连续数据环境下的道路交通事故风险预测模型 被引量:19

Road Crash Risk Prediction Model for Continuous Streaming Data Environment
原文传递
导出
摘要 针对现有研究多基于病例对照的欠采样方法,即每起事故从连续交通流数据中按一定比例抽取对照的非事故数据构建模型,而该类模型在连续数据环境中的预测精度存在缺陷的状况,对城市交通连续观测并动态调控的技术环境(简称连续数据环境)开展道路交通事故风险预测模型构建研究。首先提出基于全样本交通流数据,结合"调整事故分类阈值"的方法解决事故风险预测研究中的非平衡数据分类问题;而后采用上海市城市快速路2014年5,6月的线圈检测交通流数据及历史事故数据开展实证研究,以受试者工作特征曲线下面积为评价指标,对比基于全样本和抽样样本构建的常用事故风险预测模型(逻辑回归、随机森林)的整体预测能力;以灵敏度和特异度的几何均数为评价指标,对比3种分类阈值计算方式(约登指数法、事故占比法和交叉点法)对事故/非事故综合预测精度的影响。结果表明:在连续数据环境下,采用全样本数据建模能使模型整体预测能力提高13.06%;基于约登指数法进行分类阈值计算可使模型的事故/非事故综合预测精度最佳。 This paper describes research on a road crash risk prediction model for a continuous observation and dynamic management environment (called a continuous data environment) in an active traffic management (ATM) system. A traffic crash is an event with a small probability, and the ratio of crashes to non-crash cases in crash risk prediction research is not coordinated, and therefore poses the issue of an imbalanced data classific ation. To build a crash risk prediction model, existing research has been mostly based on a "matched case-control" under-sampling method to extract non-crash cases from continuous traffic flow data at a certain proportion- thus, the prediction accuracy of the model in a continuous data environment is inadequate. The research proposes utilizing a full set of traffic flow data to build a model and avoid an imbalanced data classification by "adjusting the classification threshold to discriminate crashes from non-crashes. " The loop detector data and crash history data of the Shanghai expressway system from May to June 2014 were used experimentally. The area under an ROC curve (AUC) was used as an index to compare the commonly used crash risk prediction model (using logistic regression and random forest algorithms) based on the full set of data and the sample data respectively. The influence of three different classification thresholds (Youden's index, the crash occupancy, and the cross point method) on the comprehensive prediction accuracy of a crash and non-crash was compared using the geometric mean of sensitivity and specificity as the indices. The results show that, in a continuous data environment, the model with a full set of data improves the overall prediction capability by 13.06%. Youden's index method for the classification threshold calculation increases the optimal comprehensive prediction accuracy of crash and non-crash cases.
作者 高珍 高屹 余荣杰 黄智强 王雪松 GAO Zhen;GAO Yi;YU Rong-jie;HUANG Zhi-qiang;WANG Xue-song(School of Software Engineering, Tongji University, Shanghai 201804, China;Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, Shanghai 201804, China)
出处 《中国公路学报》 EI CAS CSCD 北大核心 2018年第4期280-287,共8页 China Journal of Highway and Transport
基金 国家自然科学基金项目(71401127 51522810) 上海市科学技术委员会项目(15DZ1204800)
关键词 交通工程 连续数据环境 事故风险预测模型 非平衡数据 二分类阈值 城市快速路 traffic engineering continuous data environment crash risk prediction model imbalanced data binary classification threshold urban expressway
  • 相关文献

参考文献8

二级参考文献138

  • 1刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报(自然科学版),2008,36(S1):134-139. 被引量:8
  • 2钟连德,孙小端,陈永胜,贺玉龙,张杰.高速公路V/C与事故率关系研究[J].北京工业大学学报,2007,33(1):37-40. 被引量:20
  • 3林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报(自然科学版),2007,46(2):199-203. 被引量:37
  • 4Weiss G M. Mining with Rarity:A Unifying Framework[J]. SIGKDD Explorations, 2004,6(1) :7-19.
  • 5Weiss G M. Learning with Rare Cases and Small Disjunets [C]//Proc of the 12th Int'l Conf on Machine Learning, 1995:558-565.
  • 6Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis Journal, 2002,6(5) :429 450.
  • 7Chawla N V, Bowyer K W, Hall I. O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16(6) : 321-357.
  • 8Kubat M, Matwin S. Addressing the Curse of Imbalanced Data Sets:One Sided Sampling[C]//Proc of the 14th Int'l Conf on Machine Learning, 1997:179-186.
  • 9Chawla N, Lazarevic A, Hall L, et al. SMOTEBoost: Improving Prcdiction of the Minority Class in Boosting[C]// Proc of the 7th European Conf on Principles and Practice of Knowledge Discovery in Databases, 2003 : 107-119.
  • 10Fan W, Stofol S, Zhang J X. AdaCost: Misclassification Cost Sensitive Boosting[C]//Proc of the 16th Int'l Conf on Machine Learning, 1999: 97-105.

共引文献904

同被引文献176

引证文献19

二级引证文献150

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部