Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting...Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
The computational cost of support vector regression in the training phase is O (N^3), which is very expensive for a large scale problem. In addition, the solution of support vector regression is of parsimoniousness,...The computational cost of support vector regression in the training phase is O (N^3), which is very expensive for a large scale problem. In addition, the solution of support vector regression is of parsimoniousness, which has relation to a part of the whole training data set. Hence, it is reasonable to reduce the training data set. Aiming at the scheme based on k-nearest neighbors to reduce the training data set with the computational complexity O (kMN^2), an improved scheme is proposed to accelerate the reducing phase, which cuts down the computational complexity from O (kMN^2) to O (MN^2). Finally, experimental results on benchmark data sets validate the effectiveness of the improved scheme.展开更多
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways,...A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series, collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.展开更多
牦牛奶粉的掺假检测和产地识别有助于保障食品安全、维护消费者权益,是促进乳制品市场健康发展的重要举措。传统的DNA检测方法和稳定同位素分析技术的检测周期长,难以满足快速、低成本现场分析的需求。针对以上问题,本研究建立了一种基...牦牛奶粉的掺假检测和产地识别有助于保障食品安全、维护消费者权益,是促进乳制品市场健康发展的重要举措。传统的DNA检测方法和稳定同位素分析技术的检测周期长,难以满足快速、低成本现场分析的需求。针对以上问题,本研究建立了一种基于近红外光谱技术(Near-infrared Spectroscopy,NIRS)快速辨别牦牛奶粉掺假及产地的方法。收集了来自四川、甘肃、云南及青海的9个品牌的牦牛奶粉。在制备掺假样品之前,采用聚合酶链式反应(Polymerase Chain Reaction,PCR)技术和DNA凝胶电泳验证所收集的牦牛奶粉中是否掺杂了牛奶粉。完成验证后,进行掺假样品的制备以及近红外光谱数据的采集。采用K最邻近法(K-Nearest Neighbors,KNN)建立分类模型,偏最小二乘回归(Partial Least Squares Regression,PLSR)建立定量预测模型。通过优化光谱预处理方法和变量筛选方法进一步提升定量预测模型的预测能力。结果表明,KNN对牦牛奶粉掺假检测(纯牛奶粉、纯牦牛奶粉、掺杂着牛奶粉的牦牛奶粉)及产地识别(四川、甘肃、云南、青海)实现了100%的正确分类。掺假定量预测模型的校正集相关系数(R_(c))为0.9975,预测集相关系数(R_(p))为0.9913,预测集均方根误差(Root Mean Square Error of Prediction,RMSEP)为1.9823%,性能偏差比(Ratio of Performance to Deviation,RPD)为7.2522。本方法可快速、准确地预测牦牛奶粉中牛奶粉的掺杂以及牦牛奶粉产地的辨别,为牦牛奶粉的质量控制提供技术支持。展开更多
文摘Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金supported by the National Natural Science Foundation of China(50576033).
文摘The computational cost of support vector regression in the training phase is O (N^3), which is very expensive for a large scale problem. In addition, the solution of support vector regression is of parsimoniousness, which has relation to a part of the whole training data set. Hence, it is reasonable to reduce the training data set. Aiming at the scheme based on k-nearest neighbors to reduce the training data set with the computational complexity O (kMN^2), an improved scheme is proposed to accelerate the reducing phase, which cuts down the computational complexity from O (kMN^2) to O (MN^2). Finally, experimental results on benchmark data sets validate the effectiveness of the improved scheme.
基金The Project of Research on Technologyand Devices for Traffic Guidance (Vehicle Navigation)System of Beijing Municipal Commission of Science and Technology(No H030630340320)the Project of Research on theIntelligence Traffic Information Platform of Beijing Education Committee
文摘A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series, collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
文摘牦牛奶粉的掺假检测和产地识别有助于保障食品安全、维护消费者权益,是促进乳制品市场健康发展的重要举措。传统的DNA检测方法和稳定同位素分析技术的检测周期长,难以满足快速、低成本现场分析的需求。针对以上问题,本研究建立了一种基于近红外光谱技术(Near-infrared Spectroscopy,NIRS)快速辨别牦牛奶粉掺假及产地的方法。收集了来自四川、甘肃、云南及青海的9个品牌的牦牛奶粉。在制备掺假样品之前,采用聚合酶链式反应(Polymerase Chain Reaction,PCR)技术和DNA凝胶电泳验证所收集的牦牛奶粉中是否掺杂了牛奶粉。完成验证后,进行掺假样品的制备以及近红外光谱数据的采集。采用K最邻近法(K-Nearest Neighbors,KNN)建立分类模型,偏最小二乘回归(Partial Least Squares Regression,PLSR)建立定量预测模型。通过优化光谱预处理方法和变量筛选方法进一步提升定量预测模型的预测能力。结果表明,KNN对牦牛奶粉掺假检测(纯牛奶粉、纯牦牛奶粉、掺杂着牛奶粉的牦牛奶粉)及产地识别(四川、甘肃、云南、青海)实现了100%的正确分类。掺假定量预测模型的校正集相关系数(R_(c))为0.9975,预测集相关系数(R_(p))为0.9913,预测集均方根误差(Root Mean Square Error of Prediction,RMSEP)为1.9823%,性能偏差比(Ratio of Performance to Deviation,RPD)为7.2522。本方法可快速、准确地预测牦牛奶粉中牛奶粉的掺杂以及牦牛奶粉产地的辨别,为牦牛奶粉的质量控制提供技术支持。