Wildlife-vehicle collisions(WVCs)with large animals are estimated to cost the USA over 8 billion USD in property damage,tens of thousands of human injuries and nearly 200 human fatalities each year.Most WVCs occur on ...Wildlife-vehicle collisions(WVCs)with large animals are estimated to cost the USA over 8 billion USD in property damage,tens of thousands of human injuries and nearly 200 human fatalities each year.Most WVCs occur on rural roads and are not collected evenly among road segments,leading to imbalanced data.There are a disproportionate number of analysis units that have zero WVC cases when investigating large geographic areas for collision risk.Analysis units with zero WVCs can reduce prediction accuracy and weaken the coefficient estimates of statistical learning models.This study demonstrates that the use of the synthetic minority over-sampling technique(SMOTE)to handle imbalanced WVC data in combination with statistical and machine-learning models improves the ability to determine seasonal WVC risk across the rural highway network in Montana,USA.An array of regularized variables describing landscape,road and traffic were used to develop negative binomial and random forest models to infer WVC rates per 100 million vehicle miles travelled.The random forest model is found to work particularly well with SMOTE-augmented data to improve the prediction accuracy of seasonal WVC risk.SMOTE-augmented data are found to improve accuracy when predicting crash risk across fine-grained grids while retaining the characteristics of the original dataset.The analyses suggest that SMOTE augmentation mitigates data imbalance that is encountered in seasonally divided WVC data.This research provides the basis for future risk-mapping models and can potentially be used to address the low rates of WVCs and other crash types along rural roads.展开更多
文摘Wildlife-vehicle collisions(WVCs)with large animals are estimated to cost the USA over 8 billion USD in property damage,tens of thousands of human injuries and nearly 200 human fatalities each year.Most WVCs occur on rural roads and are not collected evenly among road segments,leading to imbalanced data.There are a disproportionate number of analysis units that have zero WVC cases when investigating large geographic areas for collision risk.Analysis units with zero WVCs can reduce prediction accuracy and weaken the coefficient estimates of statistical learning models.This study demonstrates that the use of the synthetic minority over-sampling technique(SMOTE)to handle imbalanced WVC data in combination with statistical and machine-learning models improves the ability to determine seasonal WVC risk across the rural highway network in Montana,USA.An array of regularized variables describing landscape,road and traffic were used to develop negative binomial and random forest models to infer WVC rates per 100 million vehicle miles travelled.The random forest model is found to work particularly well with SMOTE-augmented data to improve the prediction accuracy of seasonal WVC risk.SMOTE-augmented data are found to improve accuracy when predicting crash risk across fine-grained grids while retaining the characteristics of the original dataset.The analyses suggest that SMOTE augmentation mitigates data imbalance that is encountered in seasonally divided WVC data.This research provides the basis for future risk-mapping models and can potentially be used to address the low rates of WVCs and other crash types along rural roads.