Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting...Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.展开更多
It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when th...It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
The computational cost of support vector regression in the training phase is O (N^3), which is very expensive for a large scale problem. In addition, the solution of support vector regression is of parsimoniousness,...The computational cost of support vector regression in the training phase is O (N^3), which is very expensive for a large scale problem. In addition, the solution of support vector regression is of parsimoniousness, which has relation to a part of the whole training data set. Hence, it is reasonable to reduce the training data set. Aiming at the scheme based on k-nearest neighbors to reduce the training data set with the computational complexity O (kMN^2), an improved scheme is proposed to accelerate the reducing phase, which cuts down the computational complexity from O (kMN^2) to O (MN^2). Finally, experimental results on benchmark data sets validate the effectiveness of the improved scheme.展开更多
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways,...A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series, collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based o...In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based on the experimental data.The algorithm is performed by choosing k nearest points and training dataset to find the optimal k value.The proposed method is applied to impove and adjust pathloss model at 28 GHz in Keangnam area,Hanoi,Vietnam.The experiments in both line-of-sight and non-line-of-sight scenarios used many combinations of transmit and receive antennas at different transmit antenna heights and random locations of receive antenna have been carried out using Wireless Insite Software.The results have been compared with 3GPP and NYU Wireless Path Loss Models in order to verify the performance of the proposed approach.展开更多
Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically ...Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy.展开更多
文摘Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.
文摘It is well known that the nonparametric estimation of the regression function is highly sensitive to the presence of even a small proportion of outliers in the data.To solve the problem of typical observations when the covariates of the nonparametric component are functional,the robust estimates for the regression parameter and regression operator are introduced.The main propose of the paper is to consider data-driven methods of selecting the number of neighbors in order to make the proposed processes fully automatic.We use thek Nearest Neighbors procedure(kNN)to construct the kernel estimator of the proposed robust model.Under some regularity conditions,we state consistency results for kNN functional estimators,which are uniform in the number of neighbors(UINN).Furthermore,a simulation study and an empirical application to a real data analysis of octane gasoline predictions are carried out to illustrate the higher predictive performances and the usefulness of the kNN approach.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金supported by the National Natural Science Foundation of China(50576033).
文摘The computational cost of support vector regression in the training phase is O (N^3), which is very expensive for a large scale problem. In addition, the solution of support vector regression is of parsimoniousness, which has relation to a part of the whole training data set. Hence, it is reasonable to reduce the training data set. Aiming at the scheme based on k-nearest neighbors to reduce the training data set with the computational complexity O (kMN^2), an improved scheme is proposed to accelerate the reducing phase, which cuts down the computational complexity from O (kMN^2) to O (MN^2). Finally, experimental results on benchmark data sets validate the effectiveness of the improved scheme.
基金The Project of Research on Technologyand Devices for Traffic Guidance (Vehicle Navigation)System of Beijing Municipal Commission of Science and Technology(No H030630340320)the Project of Research on theIntelligence Traffic Information Platform of Beijing Education Committee
文摘A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series, collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金This work is carried out in the framework of the project supported by the Department of Science and Technology of Kien Giang,Vietnam.The authors would like to thank them for supporting this research。
文摘In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based on the experimental data.The algorithm is performed by choosing k nearest points and training dataset to find the optimal k value.The proposed method is applied to impove and adjust pathloss model at 28 GHz in Keangnam area,Hanoi,Vietnam.The experiments in both line-of-sight and non-line-of-sight scenarios used many combinations of transmit and receive antennas at different transmit antenna heights and random locations of receive antenna have been carried out using Wireless Insite Software.The results have been compared with 3GPP and NYU Wireless Path Loss Models in order to verify the performance of the proposed approach.
基金the National Natural Science Foundation of China under projects 61772150 and 61862012the Guangxi Key R&D Program under project AB17195025+5 种基金the Guangxi Natural Science Foundation under grants 2018GXNSFDA281054 and 2018GXNSFAA281232the National Cryptography Development Fund of China under project MMJJ20170217the Guangxi Science and Technology Base and Special Talents Program AD18281044the Innovation Project of GUET Graduate Education under project 2017YJCX46the Guangxi Young Teachers’ Basic Ability Improvement Program under Grant 2018KY0194the open program of Guangxi Key Laboratory of Cryptography and Information Security under projects GCIS201621 and GCIS201702.
文摘Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy.