Complete and reliable field traffic data is vital for the planning, design, and operation of urban traf- fic management systems. However, traffic data is often very incomplete in many traffic information systems, whic...Complete and reliable field traffic data is vital for the planning, design, and operation of urban traf- fic management systems. However, traffic data is often very incomplete in many traffic information systems, which hinders effective use of the data. Methods are needed for imputing missing traffic data to minimize the effect of incomplete data on the utilization. This paper presents an improved Local Least Squares (LLS) ap- proach to impute the incomplete data. The LLS is an improved version of the K Nearest Neighbor (KNN) method. First, the missing traffic data is replaced by a row average of the known values. Then, the vector angle and Euclidean distance are used to select the nearest neighbors. Finally, a regression step is used to get weights of the nearest neighbors and the imputation results. Traffic flow volume collected in Beijing was analyzed to compare this approach with the Bayesian Principle Component Analysis (BPCA) imputation ap- proach. Tests show that this approach provides slightly better performance than BPCA imputation to impute missing traffic data.展开更多
基金Partially supported by the National High-Tech Research and Development (863) Program of China (Nos. 2009AA11Z206 and 2011AA110401)the National Natural Science Foundation of China (Nos. 60721003 and 60834001)Tsinghua University Innovation Research Program (No. 2009THZ0)
文摘Complete and reliable field traffic data is vital for the planning, design, and operation of urban traf- fic management systems. However, traffic data is often very incomplete in many traffic information systems, which hinders effective use of the data. Methods are needed for imputing missing traffic data to minimize the effect of incomplete data on the utilization. This paper presents an improved Local Least Squares (LLS) ap- proach to impute the incomplete data. The LLS is an improved version of the K Nearest Neighbor (KNN) method. First, the missing traffic data is replaced by a row average of the known values. Then, the vector angle and Euclidean distance are used to select the nearest neighbors. Finally, a regression step is used to get weights of the nearest neighbors and the imputation results. Traffic flow volume collected in Beijing was analyzed to compare this approach with the Bayesian Principle Component Analysis (BPCA) imputation ap- proach. Tests show that this approach provides slightly better performance than BPCA imputation to impute missing traffic data.