In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power ...In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power generation forecast method using the combination of K-means++, grey relational analysis (GRA) and support vector regression (SVR) based on feature selection (Hybrid Kmeans-GRA-SVR, HKGSVR) was proposed. The historical power data were clustered through the multi-index K-means++ algorithm and divided into ideal and non-ideal weather. The GRA algorithm was used to match the similar day and the nearest neighbor similar day of the prediction day. And selected appropriate input features for different weather types to train the SVR model. Under ideal weather, the average values of MAE, RMSE and R2 were 0.8101, 0.9608 kW and 99.66%, respectively. And this method reduced the average training time by 77.27% compared with the standard SVR model. Under non-ideal weather conditions, the average values of MAE, RMSE and R2 were 1.8337, 2.1379 kW and 98.47%, respectively. And this method reduced the average training time of the standard SVR model by 98.07%. The experimental results show that the prediction accuracy of the proposed model is significantly improved compared to the other five models, which verify the effectiveness of the method.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
文摘In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power generation forecast method using the combination of K-means++, grey relational analysis (GRA) and support vector regression (SVR) based on feature selection (Hybrid Kmeans-GRA-SVR, HKGSVR) was proposed. The historical power data were clustered through the multi-index K-means++ algorithm and divided into ideal and non-ideal weather. The GRA algorithm was used to match the similar day and the nearest neighbor similar day of the prediction day. And selected appropriate input features for different weather types to train the SVR model. Under ideal weather, the average values of MAE, RMSE and R2 were 0.8101, 0.9608 kW and 99.66%, respectively. And this method reduced the average training time by 77.27% compared with the standard SVR model. Under non-ideal weather conditions, the average values of MAE, RMSE and R2 were 1.8337, 2.1379 kW and 98.47%, respectively. And this method reduced the average training time of the standard SVR model by 98.07%. The experimental results show that the prediction accuracy of the proposed model is significantly improved compared to the other five models, which verify the effectiveness of the method.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.