Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial pertur...Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial perturbation method tends only to capture synoptic scale initial uncertainty rather than mesoscale uncertainty in global ensemble prediction. To address this issue, a multiscale SV initial perturbation method based on the China Meteorological Administration Global Ensemble Prediction System(CMA-GEPS) is proposed to quantify multiscale initial uncertainty. The multiscale SV initial perturbation approach entails calculating multiscale SVs at different resolutions with multiple linearized physical processes to capture fast-growing perturbations from mesoscale to synoptic scale in target areas and combining these SVs by using a Gaussian sampling method with amplitude coefficients to generate initial perturbations. Following that, the energy norm,energy spectrum, and structure of multiscale SVs and their impact on GEPS are analyzed based on a batch experiment in different seasons. The results show that the multiscale SV initial perturbations can possess more energy and capture more mesoscale uncertainties than the traditional single-SV method. Meanwhile, multiscale SV initial perturbations can reflect the strongest dynamical instability in target areas. Their performances in global ensemble prediction when compared to single-scale SVs are shown to(i) improve the relationship between the ensemble spread and the root-mean-square error and(ii) provide a better probability forecast skill for atmospheric circulation during the late forecast period and for short-to medium-range precipitation. This study provides scientific evidence and application foundations for the design and development of a multiscale SV initial perturbation method for the GEPS.展开更多
With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning ...With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.展开更多
Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of da...Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of data relating to both defective and non-defective software.The latter software class’s data are predominately present in the dataset in the majority of experimental situations.The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification.Besides the successful feature selection approach,a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance,providing robustness in the classification process.To overcome these problems and lessen their impact on the fault classification performance,authors carefully integrate effective feature selection with ensemble learning models.Forward selection demonstrates that a significant area under the receiver operating curve(ROC)can be attributed to only a small subset of features.The Greedy forward selection(GFS)technique outperformed Pearson’s correlation method when evaluating feature selection techniques on the datasets.Ensemble learners,such as random forests(RF)and the proposed average probability ensemble(APE),demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines(W-SVMs)and extreme learning machines(ELM).Furthermore,in the case of the NASA and Java datasets,the enhanced average probability ensemble model,which incorporates the Greedy forward selection technique with the average probability ensemble model,achieved remarkably high accuracy for the area under the ROC.It approached a value of 1.0,indicating exceptional performance.This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components.In addition,the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.展开更多
Big data and information and communication technologies can be important to the effectiveness of smart cities.Based on the maximal attention on smart city sustainability,developing data-driven smart cities is newly ob...Big data and information and communication technologies can be important to the effectiveness of smart cities.Based on the maximal attention on smart city sustainability,developing data-driven smart cities is newly obtained attention as a vital technology for addressing sustainability problems.Real-time monitoring of pollution allows local authorities to analyze the present traffic condition of cities and make decisions.Relating to air pollution occurs a main environmental problem in smart city environments.The effect of the deep learning(DL)approach quickly increased and penetrated almost every domain,comprising air pollution forecast.Therefore,this article develops a new Coot Optimization Algorithm with an Ensemble Deep Learning based Air Pollution Prediction(COAEDL-APP)system for Sustainable Smart Cities.The projected COAEDL-APP algorithm accurately forecasts the presence of air quality in the sustainable smart city environment.To achieve this,the COAEDL-APP technique initially performs a linear scaling normalization(LSN)approach to pre-process the input data.For air quality prediction,an ensemble of three DL models has been involved,namely autoencoder(AE),long short-term memory(LSTM),and deep belief network(DBN).Furthermore,the COA-based hyperparameter tuning procedure can be designed to adjust the hyperparameter values of the DL models.The simulation outcome of the COAEDL-APP algorithm was tested on the air quality database,and the outcomes stated the improved performance of the COAEDL-APP algorithm over other existing systems with maximum accuracy of 98.34%.展开更多
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield base...The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.展开更多
Based on a simple coupled Lorenz model,we investigate how to assess a suitable initial perturbation scheme for ensemble forecasting in a multiscale system involving slow dynamics and fast dynamics.Four initial perturb...Based on a simple coupled Lorenz model,we investigate how to assess a suitable initial perturbation scheme for ensemble forecasting in a multiscale system involving slow dynamics and fast dynamics.Four initial perturbation approaches are used in the ensemble forecasting experiments:the random perturbation(RP),the bred vector(BV),the ensemble transform Kalman filter(ETKF),and the nonlinear local Lyapunov vector(NLLV)methods.Results show that,regardless of the method used,the ensemble averages behave indistinguishably from the control forecasts during the first few time steps.Due to different error growth in different time-scale systems,the ensemble averages perform better than the control forecast after very short lead times in a fast subsystem but after a relatively long period of time in a slow subsystem.Due to the coupled dynamic processes,the addition of perturbations to fast variables or to slow variables can contribute to an improvement in the forecasting skill for fast variables and slow variables.Regarding the initial perturbation approaches,the NLLVs show higher forecasting skill than the BVs or RPs overall.The NLLVs and ETKFs had nearly equivalent prediction skill,but NLLVs performed best by a narrow margin.In particular,when adding perturbations to slow variables,the independent perturbations(NLLVs and ETKFs)perform much better in ensemble prediction.These results are simply implied in a real coupled air–sea model.For the prediction of oceanic variables,using independent perturbations(NLLVs)and adding perturbations to oceanic variables are expected to result in better performance in the ensemble prediction.展开更多
Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confronta...Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confrontation training to achieve real-time and accurate prediction of target maneuver trajectory is an urgent problem to be solved.To solve this problem,in this paper,a hybrid algorithm based on transfer learning,online learning,ensemble learning,regularization technology,target maneuvering segmentation point recognition algorithm,and Volterra series,abbreviated as AERTrOS-Volterra is proposed.Firstly,the model makes full use of a large number of trajectory sample data generated by air combat confrontation training,and constructs a Tr-Volterra algorithm framework suitable for air combat target maneuver trajectory prediction,which realizes the extraction of effective information from the historical trajectory data.Secondly,in order to improve the real-time online prediction accuracy and robustness of the prediction model in complex electromagnetic environments,on the basis of the TrVolterra algorithm framework,a robust regularized online Sequential Volterra prediction model is proposed by integrating online learning method,regularization technology and inverse weighting calculation method based on the priori error.Finally,inspired by the preferable performance of models ensemble,ensemble learning scheme is also incorporated into our proposed algorithm,which adaptively updates the ensemble prediction model according to the performance of the model on real-time samples and the recognition results of target maneuvering segmentation points,including the adaptation of model weights;adaptation of parameters;and dynamic inclusion and removal of models.Compared with many existing time series prediction methods,the newly proposed target maneuver trajectory prediction algorithm can fully mine the prior knowledge contained in the historical data to assist the current prediction.The rationality and effectiveness of the proposed algorithm are verified by simulation on three sets of chaotic time series data sets and a set of real target maneuver trajectory data sets.展开更多
The quality of the airwe breathe during the courses of our daily lives has a significant impact on our health and well-being as individuals.Unfortunately,personal air quality measurement remains challenging.In this st...The quality of the airwe breathe during the courses of our daily lives has a significant impact on our health and well-being as individuals.Unfortunately,personal air quality measurement remains challenging.In this study,we investigate the use of first-person photos for the prediction of air quality.The main idea is to harness the power of a generalized stacking approach and the importance of haze features extracted from first-person images to create an efficient new stacking model called AirStackNet for air pollution prediction.AirStackNet consists of two layers and four regression models,where the first layer generates meta-data fromLight Gradient Boosting Machine(Light-GBM),Extreme Gradient Boosting Regression(XGBoost)and CatBoost Regression(CatBoost),whereas the second layer computes the final prediction from the meta-data of the first layer using Extra Tree Regression(ET).The performance of the proposed AirStackNet model is validated using public Personal Air Quality Dataset(PAQD).Our experiments are evaluated using Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Coefficient of Determination(R2),Mean Squared Error(MSE),Root Mean Squared Logarithmic Error(RMSLE),and Mean Absolute Percentage Error(MAPE).Experimental Results indicate that the proposed AirStackNet model not only can effectively improve air pollution prediction performance by overcoming the Bias-Variance tradeoff,but also outperforms baseline and state of the art models.展开更多
The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to ...The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to con-centrate on problematic modules rather than all the modules.This approach can enhance the quality of the final product while lowering development costs.Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team.This process is known as software defect prediction,and it can improve end-product quality while reducing the cost of testing and maintenance.This study proposes a software defect prediction system that utilizes data fusion,feature selection,and ensemble machine learning fusion techniques.A novel filter-based metric selection technique is proposed in the framework to select the optimum features.A three-step nested approach is presented for predicting defective modules to achieve high accuracy.In the first step,three supervised machine learning techniques,including Decision Tree,Support Vector Machines,and Naïve Bayes,are used to detect faulty modules.The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods:Bagging,Voting,and Stacking.Finally,in the third step,a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques.The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets.Five NASA datasets are integrated to create the fused dataset:MW1,PC1,PC3,PC4,and CM1.According to the results,the proposed system exhibited superior performance to other advanced techniques for predicting software defects,achieving a remarkable accuracy rate of 92.08%.展开更多
Cardiovascular disease is among the top five fatal diseases that affect lives worldwide.Therefore,its early prediction and detection are crucial,allowing one to take proper and necessary measures at earlier stages.Mac...Cardiovascular disease is among the top five fatal diseases that affect lives worldwide.Therefore,its early prediction and detection are crucial,allowing one to take proper and necessary measures at earlier stages.Machine learning(ML)techniques are used to assist healthcare providers in better diagnosing heart disease.This study employed three boosting algorithms,namely,gradient boost,XGBoost,and AdaBoost,to predict heart disease.The dataset contained heart disease-related clinical features and was sourced from the publicly available UCI ML repository.Exploratory data analysis is performed to find the characteristics of data samples about descriptive and inferential statistics.Specifically,it was carried out to identify and replace outliers using the interquartile range and detect and replace the missing values using the imputation method.Results were recorded before and after the data preprocessing techniques were applied.Out of all the algorithms,gradient boosting achieved the highest accuracy rate of 92.20%for the proposed model.The proposed model yielded better results with gradient boosting in terms of precision,recall,and f1-score.It attained better prediction performance than the existing works and can be used for other diseases that share common features using transfer learning.展开更多
Nowadays,quantum machine learning is attracting great interest in a wide range offields due to its potential superior performance and capabilities.The massive increase in computational capacity and speed of quantum com...Nowadays,quantum machine learning is attracting great interest in a wide range offields due to its potential superior performance and capabilities.The massive increase in computational capacity and speed of quantum computers can lead to a quantum leap in the healthcarefield.Heart disease seriously threa-tens human health since it is the leading cause of death worldwide.Quantum machine learning methods can propose effective solutions to predict heart disease and aid in early diagnosis.In this study,an ensemble machine learning model based on quantum machine learning classifiers is proposed to predict the risk of heart disease.The proposed model is a bagging ensemble learning model where a quantum support vector classifier was used as a base classifier.Further-more,in order to make the model’s outcomes more explainable,the importance of every single feature in the prediction is computed and visualized using SHapley Additive exPlanations(SHAP)framework.In the experimental study,other stand-alone quantum classifiers,namely,Quantum Support Vector Classifier(QSVC),Quantum Neural Network(QNN),and Variational Quantum Classifier(VQC)are applied and compared with classical machine learning classifiers such as Sup-port Vector Machine(SVM),and Artificial Neural Network(ANN).The experi-mental results on the Cleveland dataset reveal the superiority of QSVC compared to the others,which explains its use in the proposed bagging model.The Bagging-QSVC model outperforms all aforementioned classifiers with an accuracy of 90.16%while showing great competitiveness compared to some state-of-the-art models using the same dataset.The results of the study indicate that quantum machine learning classifiers perform better than classical machine learning classi-fiers in predicting heart disease.In addition,the study reveals that the bagging ensemble learning technique is effective in improving the prediction accuracy of quantum classifiers.展开更多
Diabetic Eye Disease(DED)is a fundamental cause of blindness in human beings in the medical world.Different techniques are proposed to forecast and examine the stages in Prognostication of Diabetic Retinopathy(DR).The...Diabetic Eye Disease(DED)is a fundamental cause of blindness in human beings in the medical world.Different techniques are proposed to forecast and examine the stages in Prognostication of Diabetic Retinopathy(DR).The Machine Learning(ML)and the Deep Learning(DL)algorithms are the predomi-nant techniques to project and explore the images of DR.Even though some solu-tions were adapted to challenge the cause of DR disease,still there should be an efficient and accurate DR prediction to be adapted to refine its performance.In this work,a hybrid technique was proposed for classification and prediction of DR.The proposed hybrid technique consists of Ensemble Learning(EL),2 Dimensional-Conventional Neural Network(2D-CNN),Transfer Learning(TL)and Correlation method.Initially,the Stochastic Gradient Boosting(SGB)EL method was used to predict the DR.Secondly,the boosting based EL method was used to predict the DR of images.Thirdly 2D-CNN was applied to categorize the various stages of DR images.Finally,the TL was adopted to transfer the clas-sification prediction to training datasets.When this TL was applied,a new predic-tion feature was increased.From the experiment,the proposed technique has achieved 97.8%of accuracy in prophecies of DR images and 98%accuracy in grading of images.The experiment was also extended to measure the sensitivity(99.6%)and specificity(97.3%)metrics.The predicted accuracy rate was com-pared with existing methods.展开更多
Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.M...Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features.They focus on the analysis of the learning score and check result of online course,but neglect the phased student behaviors.Besides,the status of student participation at a given moment is necessarily impacted by the prior status of learning.To address these issues,this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP)that integrates attention-based document representation as a vector(A-Doc2vec),feature learning of course difficulty,and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC).First,A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments.It also captures the relationship between courses and videos.Then,a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction.Finally,WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging.Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy,Precision,Recall,and F1.展开更多
Widely used deep neural networks currently face limitations in achieving optimal performance for purchase intention prediction due to constraints on data volume and hyperparameter selection.To address this issue,based...Widely used deep neural networks currently face limitations in achieving optimal performance for purchase intention prediction due to constraints on data volume and hyperparameter selection.To address this issue,based on the deep forest algorithm and further integrating evolutionary ensemble learning methods,this paper proposes a novel Deep Adaptive Evolutionary Ensemble(DAEE)model.This model introduces model diversity into the cascade layer,allowing it to adaptively adjust its structure to accommodate complex and evolving purchasing behavior patterns.Moreover,this paper optimizes the methods of obtaining feature vectors,enhancement vectors,and prediction results within the deep forest algorithm to enhance the model’s predictive accuracy.Results demonstrate that the improved deep forest model not only possesses higher robustness but also shows an increase of 5.02%in AUC value compared to the baseline model.Furthermore,its training runtime speed is 6 times faster than that of deep models,and compared to other improved models,its accuracy has been enhanced by 0.9%.展开更多
The rapid growth of the Chinese economy has fueled the expansion of power grids.Power transformers are key equipment in power grid projects,and their price changes have a significant impact on cost control.However,the...The rapid growth of the Chinese economy has fueled the expansion of power grids.Power transformers are key equipment in power grid projects,and their price changes have a significant impact on cost control.However,the prices of power transformer materials manifest as nonsmooth and nonlinear sequences.Hence,estimating the acquisition costs of power grid projects is difficult,hindering the normal operation of power engineering construction.To more accurately predict the price of power transformer materials,this study proposes a method based on complementary ensemble empirical mode decomposition(CEEMD)and gated recurrent unit(GRU)network.First,the CEEMD decomposed the price series into multiple intrinsic mode functions(IMFs).Multiple IMFs were clustered to obtain several aggregated sequences based on the sample entropy of each IMF.Then,an empirical wavelet transform(EWT)was applied to the aggregation sequence with a large sample entropy,and the multiple subsequences obtained from the decomposition were predicted by the GRU model.The GRU model was used to directly predict the aggregation sequences with a small sample entropy.In this study,we used authentic historical pricing data for power transformer materials to validate the proposed approach.The empirical findings demonstrated the efficacy of our method across both datasets,with mean absolute percentage errors(MAPEs)of less than 1%and 3%.This approach holds a significant reference value for future research in the field of power transformer material price prediction.展开更多
Precipitation is a significant index to measure the degree of drought and flood in a region,which directly reflects the local natural changes and ecological environment.It is very important to grasp the change charact...Precipitation is a significant index to measure the degree of drought and flood in a region,which directly reflects the local natural changes and ecological environment.It is very important to grasp the change characteristics and law of precipitation accurately for effectively reducing disaster loss and maintaining the stable development of a social economy.In order to accurately predict precipitation,a new precipitation prediction model based on extreme learning machine ensemble(ELME)is proposed.The integrated model is based on the extreme learning machine(ELM)with different kernel functions and supporting parameters,and the submodel with the minimum root mean square error(RMSE)is found to fit the test data.Due to the complex mechanism and factors affecting precipitation change,the data have strong uncertainty and significant nonlinear variation characteristics.The mean generating function(MGF)is used to generate the continuation factor matrix,and the principal component analysis technique is employed to reduce the dimension of the continuation matrix,and the effective data features are extracted.Finally,the ELME prediction model is established by using the precipitation data of Liuzhou city from 1951 to 2021 in June,July and August,and a comparative experiment is carried out by using ELM,long-term and short-term memory neural network(LSTM)and back propagation neural network based on genetic algorithm(GA-BP).The experimental results show that the prediction accuracy of the proposed method is significantly higher than that of other models,and it has high stability and reliability,which provides a reliable method for precipitation prediction.展开更多
When a customer uses the software, then it is possible to occur defects that can be removed in the updated versions of the software. Hence, in the present work, a robust examination of cross-project software defect pr...When a customer uses the software, then it is possible to occur defects that can be removed in the updated versions of the software. Hence, in the present work, a robust examination of cross-project software defect prediction is elaborated through an innovative hybrid machine learning framework. The proposed technique combines an advanced deep neural network architecture with ensemble models such as Support Vector Machine (SVM), Random Forest (RF), and XGBoost. The study evaluates the performance by considering multiple software projects like CM1, JM1, KC1, and PC1 using datasets from the PROMISE Software Engineering Repository. The three hybrid models that are compared are Hybrid Model-1 (SVM, RandomForest, XGBoost, Neural Network), Hybrid Model-2 (GradientBoosting, DecisionTree, LogisticRegression, Neural Network), and Hybrid Model-3 (KNeighbors, GaussianNB, Support Vector Classification (SVC), Neural Network), and the Hybrid Model 3 surpasses the others in terms of recall, F1-score, accuracy, ROC AUC, and precision. The presented work offers valuable insights into the effectiveness of hybrid techniques for cross-project defect prediction, providing a comparative perspective on early defect identification and mitigation strategies. .展开更多
The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific (Asia-Pacific) summer monsoon region (0°-50°N, 100° 150°E) were evaluated in ni...The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific (Asia-Pacific) summer monsoon region (0°-50°N, 100° 150°E) were evaluated in nine different AGCM, used in the Asia-Pacific Economic Cooperation Climate Center (APCC) multi-model ensemble seasonal prediction system. The analysis indicates that the precipitation anomaly patterns of model ensemble predictions are substantially different from the observed counterparts in this region, but the summer monsoon circulations are reasonably predicted. For example, all models can well produce the interannual variability of the western North Pacific monsoon index (WNPMI) defined by 850 hPa winds, but they failed to predict the relationship between WNPMI and precipitation anomalies. The interannual variability of the 500 hPa geopotential height (GPH) can be well predicted by the models in contrast to precipitation anomalies. On the basis of such model performances and the relationship between the interannual variations of 500 hPa GPH and precipitation anomalies, we developed a statistical scheme used to downscale the summer monsoon precipitation anomaly on the basis of EOF and singular value decomposition (SVD). In this scheme, the three leading EOF modes of 500 hPa GPH anomaly fields predicted by the models are firstly corrected by the linear regression between the principal components in each model and observation, respectively. Then, the corrected model GPH is chosen as the predictor to downscale the precipitation anomaly field, which is assembled by the forecasted expansion coefficients of model 500 hPa GPH and the three leading SVD modes of observed precipitation anomaly corresponding to the prediction of model 500 hPa GPH during a 19-year training period. The cross-validated forecasts suggest that this downscaling scheme may have a potential to improve the forecast skill of the precipitation anomaly in the South China Sea, western North Pacific and the East Asia Pacific regions, where the anomaly correlation coefficient (ACC) has been improved by 0.14, corresponding to the reduced RMSE of 10.4% in the conventional multi-model ensemble (MME) forecast.展开更多
An initial conditions (ICs) perturbation method was developed with the aim to improve an operational regional ensemble prediction system (REPS). Three issues were identified and investigated: (1) the impacts of...An initial conditions (ICs) perturbation method was developed with the aim to improve an operational regional ensemble prediction system (REPS). Three issues were identified and investigated: (1) the impacts of perturbation scale on the ensemble spread and forecast skill of the REPS; (2) the scale characteristic of the IC perturbations of the REPS; and (3) whether the REPS's skill could be improved by adding large-scale information to the IC perturbations. Numerical experiments were conducted to reveal the impact of perturbation scale on the ensemble spread and forecast skill. The scales of IC perturbations from the REPS and an operational global ensemble prediction system (GEPS) were analyzed. A "multi-scale blending" (MSB) IC perturbation scheme was developed, and the main findings can be summarized as follows: The growth rates of the ensemble spread of the REPS are sensitive to the scale of the IC perturbations; the ensemble forecast skills can benefit from large-scale perturbations; the global ensemble IC perturbations exhibit more power at larger scales, while the regional ensemble IC perturbations contain more power at smaller scales; the MSB method can generate IC perturbations by combining the small-scale component from the REPS and the large-scale component from the GEPS; the energy norm growth of the MSB-generated perturbations can be appropriate at all forecast lead times; and the MSB-based REPS shows higher skill than the original system, as determined by ensemble forecast verification.展开更多
基金supported by the Joint Funds of the Chinese National Natural Science Foundation (NSFC)(Grant No.U2242213)the National Key Research and Development (R&D)Program of the Ministry of Science and Technology of China(Grant No. 2021YFC3000902)the National Science Foundation for Young Scholars (Grant No. 42205166)。
文摘Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial perturbation method tends only to capture synoptic scale initial uncertainty rather than mesoscale uncertainty in global ensemble prediction. To address this issue, a multiscale SV initial perturbation method based on the China Meteorological Administration Global Ensemble Prediction System(CMA-GEPS) is proposed to quantify multiscale initial uncertainty. The multiscale SV initial perturbation approach entails calculating multiscale SVs at different resolutions with multiple linearized physical processes to capture fast-growing perturbations from mesoscale to synoptic scale in target areas and combining these SVs by using a Gaussian sampling method with amplitude coefficients to generate initial perturbations. Following that, the energy norm,energy spectrum, and structure of multiscale SVs and their impact on GEPS are analyzed based on a batch experiment in different seasons. The results show that the multiscale SV initial perturbations can possess more energy and capture more mesoscale uncertainties than the traditional single-SV method. Meanwhile, multiscale SV initial perturbations can reflect the strongest dynamical instability in target areas. Their performances in global ensemble prediction when compared to single-scale SVs are shown to(i) improve the relationship between the ensemble spread and the root-mean-square error and(ii) provide a better probability forecast skill for atmospheric circulation during the late forecast period and for short-to medium-range precipitation. This study provides scientific evidence and application foundations for the design and development of a multiscale SV initial perturbation method for the GEPS.
文摘With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.
文摘Redundancy,correlation,feature irrelevance,and missing samples are just a few problems that make it difficult to analyze software defect data.Additionally,it might be challenging to maintain an even distribution of data relating to both defective and non-defective software.The latter software class’s data are predominately present in the dataset in the majority of experimental situations.The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification.Besides the successful feature selection approach,a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance,providing robustness in the classification process.To overcome these problems and lessen their impact on the fault classification performance,authors carefully integrate effective feature selection with ensemble learning models.Forward selection demonstrates that a significant area under the receiver operating curve(ROC)can be attributed to only a small subset of features.The Greedy forward selection(GFS)technique outperformed Pearson’s correlation method when evaluating feature selection techniques on the datasets.Ensemble learners,such as random forests(RF)and the proposed average probability ensemble(APE),demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines(W-SVMs)and extreme learning machines(ELM).Furthermore,in the case of the NASA and Java datasets,the enhanced average probability ensemble model,which incorporates the Greedy forward selection technique with the average probability ensemble model,achieved remarkably high accuracy for the area under the ROC.It approached a value of 1.0,indicating exceptional performance.This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components.In addition,the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.
基金funded by the Deanship of Scientific Research(DSR),King Abdulaziz University(KAU),Jeddah,Saudi Arabia under Grant No.(IFPIP:631-612-1443).
文摘Big data and information and communication technologies can be important to the effectiveness of smart cities.Based on the maximal attention on smart city sustainability,developing data-driven smart cities is newly obtained attention as a vital technology for addressing sustainability problems.Real-time monitoring of pollution allows local authorities to analyze the present traffic condition of cities and make decisions.Relating to air pollution occurs a main environmental problem in smart city environments.The effect of the deep learning(DL)approach quickly increased and penetrated almost every domain,comprising air pollution forecast.Therefore,this article develops a new Coot Optimization Algorithm with an Ensemble Deep Learning based Air Pollution Prediction(COAEDL-APP)system for Sustainable Smart Cities.The projected COAEDL-APP algorithm accurately forecasts the presence of air quality in the sustainable smart city environment.To achieve this,the COAEDL-APP technique initially performs a linear scaling normalization(LSN)approach to pre-process the input data.For air quality prediction,an ensemble of three DL models has been involved,namely autoencoder(AE),long short-term memory(LSTM),and deep belief network(DBN).Furthermore,the COA-based hyperparameter tuning procedure can be designed to adjust the hyperparameter values of the DL models.The simulation outcome of the COAEDL-APP algorithm was tested on the air quality database,and the outcomes stated the improved performance of the COAEDL-APP algorithm over other existing systems with maximum accuracy of 98.34%.
基金supported by the Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII)。
文摘The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.
基金jointly supported by the National Natural Science Foundation of China (Grant Nos. 42225501, 42105059)
文摘Based on a simple coupled Lorenz model,we investigate how to assess a suitable initial perturbation scheme for ensemble forecasting in a multiscale system involving slow dynamics and fast dynamics.Four initial perturbation approaches are used in the ensemble forecasting experiments:the random perturbation(RP),the bred vector(BV),the ensemble transform Kalman filter(ETKF),and the nonlinear local Lyapunov vector(NLLV)methods.Results show that,regardless of the method used,the ensemble averages behave indistinguishably from the control forecasts during the first few time steps.Due to different error growth in different time-scale systems,the ensemble averages perform better than the control forecast after very short lead times in a fast subsystem but after a relatively long period of time in a slow subsystem.Due to the coupled dynamic processes,the addition of perturbations to fast variables or to slow variables can contribute to an improvement in the forecasting skill for fast variables and slow variables.Regarding the initial perturbation approaches,the NLLVs show higher forecasting skill than the BVs or RPs overall.The NLLVs and ETKFs had nearly equivalent prediction skill,but NLLVs performed best by a narrow margin.In particular,when adding perturbations to slow variables,the independent perturbations(NLLVs and ETKFs)perform much better in ensemble prediction.These results are simply implied in a real coupled air–sea model.For the prediction of oceanic variables,using independent perturbations(NLLVs)and adding perturbations to oceanic variables are expected to result in better performance in the ensemble prediction.
基金the support of the Fundamental Research Funds for the Air Force Engineering University under Grant No.XZJK2019040。
文摘Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confrontation training to achieve real-time and accurate prediction of target maneuver trajectory is an urgent problem to be solved.To solve this problem,in this paper,a hybrid algorithm based on transfer learning,online learning,ensemble learning,regularization technology,target maneuvering segmentation point recognition algorithm,and Volterra series,abbreviated as AERTrOS-Volterra is proposed.Firstly,the model makes full use of a large number of trajectory sample data generated by air combat confrontation training,and constructs a Tr-Volterra algorithm framework suitable for air combat target maneuver trajectory prediction,which realizes the extraction of effective information from the historical trajectory data.Secondly,in order to improve the real-time online prediction accuracy and robustness of the prediction model in complex electromagnetic environments,on the basis of the TrVolterra algorithm framework,a robust regularized online Sequential Volterra prediction model is proposed by integrating online learning method,regularization technology and inverse weighting calculation method based on the priori error.Finally,inspired by the preferable performance of models ensemble,ensemble learning scheme is also incorporated into our proposed algorithm,which adaptively updates the ensemble prediction model according to the performance of the model on real-time samples and the recognition results of target maneuvering segmentation points,including the adaptation of model weights;adaptation of parameters;and dynamic inclusion and removal of models.Compared with many existing time series prediction methods,the newly proposed target maneuver trajectory prediction algorithm can fully mine the prior knowledge contained in the historical data to assist the current prediction.The rationality and effectiveness of the proposed algorithm are verified by simulation on three sets of chaotic time series data sets and a set of real target maneuver trajectory data sets.
基金the Deputyship for Research and Innovation,Ministry of Education in Saudi Arabia for funding this research through project number PNU-DRI-RI-20-033.
文摘The quality of the airwe breathe during the courses of our daily lives has a significant impact on our health and well-being as individuals.Unfortunately,personal air quality measurement remains challenging.In this study,we investigate the use of first-person photos for the prediction of air quality.The main idea is to harness the power of a generalized stacking approach and the importance of haze features extracted from first-person images to create an efficient new stacking model called AirStackNet for air pollution prediction.AirStackNet consists of two layers and four regression models,where the first layer generates meta-data fromLight Gradient Boosting Machine(Light-GBM),Extreme Gradient Boosting Regression(XGBoost)and CatBoost Regression(CatBoost),whereas the second layer computes the final prediction from the meta-data of the first layer using Extra Tree Regression(ET).The performance of the proposed AirStackNet model is validated using public Personal Air Quality Dataset(PAQD).Our experiments are evaluated using Mean Absolute Error(MAE),Root Mean Square Error(RMSE),Coefficient of Determination(R2),Mean Squared Error(MSE),Root Mean Squared Logarithmic Error(RMSLE),and Mean Absolute Percentage Error(MAPE).Experimental Results indicate that the proposed AirStackNet model not only can effectively improve air pollution prediction performance by overcoming the Bias-Variance tradeoff,but also outperforms baseline and state of the art models.
基金supported by the Center for Cyber-Physical Systems,Khalifa University,under Grant 8474000137-RC1-C2PS-T5.
文摘The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to con-centrate on problematic modules rather than all the modules.This approach can enhance the quality of the final product while lowering development costs.Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team.This process is known as software defect prediction,and it can improve end-product quality while reducing the cost of testing and maintenance.This study proposes a software defect prediction system that utilizes data fusion,feature selection,and ensemble machine learning fusion techniques.A novel filter-based metric selection technique is proposed in the framework to select the optimum features.A three-step nested approach is presented for predicting defective modules to achieve high accuracy.In the first step,three supervised machine learning techniques,including Decision Tree,Support Vector Machines,and Naïve Bayes,are used to detect faulty modules.The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods:Bagging,Voting,and Stacking.Finally,in the third step,a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques.The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets.Five NASA datasets are integrated to create the fused dataset:MW1,PC1,PC3,PC4,and CM1.According to the results,the proposed system exhibited superior performance to other advanced techniques for predicting software defects,achieving a remarkable accuracy rate of 92.08%.
基金This work was supported by National Research Foundation of Korea-Grant funded by the Korean Government(MSIT)-NRF-2020R1A2B5B02002478.
文摘Cardiovascular disease is among the top five fatal diseases that affect lives worldwide.Therefore,its early prediction and detection are crucial,allowing one to take proper and necessary measures at earlier stages.Machine learning(ML)techniques are used to assist healthcare providers in better diagnosing heart disease.This study employed three boosting algorithms,namely,gradient boost,XGBoost,and AdaBoost,to predict heart disease.The dataset contained heart disease-related clinical features and was sourced from the publicly available UCI ML repository.Exploratory data analysis is performed to find the characteristics of data samples about descriptive and inferential statistics.Specifically,it was carried out to identify and replace outliers using the interquartile range and detect and replace the missing values using the imputation method.Results were recorded before and after the data preprocessing techniques were applied.Out of all the algorithms,gradient boosting achieved the highest accuracy rate of 92.20%for the proposed model.The proposed model yielded better results with gradient boosting in terms of precision,recall,and f1-score.It attained better prediction performance than the existing works and can be used for other diseases that share common features using transfer learning.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R196),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Nowadays,quantum machine learning is attracting great interest in a wide range offields due to its potential superior performance and capabilities.The massive increase in computational capacity and speed of quantum computers can lead to a quantum leap in the healthcarefield.Heart disease seriously threa-tens human health since it is the leading cause of death worldwide.Quantum machine learning methods can propose effective solutions to predict heart disease and aid in early diagnosis.In this study,an ensemble machine learning model based on quantum machine learning classifiers is proposed to predict the risk of heart disease.The proposed model is a bagging ensemble learning model where a quantum support vector classifier was used as a base classifier.Further-more,in order to make the model’s outcomes more explainable,the importance of every single feature in the prediction is computed and visualized using SHapley Additive exPlanations(SHAP)framework.In the experimental study,other stand-alone quantum classifiers,namely,Quantum Support Vector Classifier(QSVC),Quantum Neural Network(QNN),and Variational Quantum Classifier(VQC)are applied and compared with classical machine learning classifiers such as Sup-port Vector Machine(SVM),and Artificial Neural Network(ANN).The experi-mental results on the Cleveland dataset reveal the superiority of QSVC compared to the others,which explains its use in the proposed bagging model.The Bagging-QSVC model outperforms all aforementioned classifiers with an accuracy of 90.16%while showing great competitiveness compared to some state-of-the-art models using the same dataset.The results of the study indicate that quantum machine learning classifiers perform better than classical machine learning classi-fiers in predicting heart disease.In addition,the study reveals that the bagging ensemble learning technique is effective in improving the prediction accuracy of quantum classifiers.
文摘Diabetic Eye Disease(DED)is a fundamental cause of blindness in human beings in the medical world.Different techniques are proposed to forecast and examine the stages in Prognostication of Diabetic Retinopathy(DR).The Machine Learning(ML)and the Deep Learning(DL)algorithms are the predomi-nant techniques to project and explore the images of DR.Even though some solu-tions were adapted to challenge the cause of DR disease,still there should be an efficient and accurate DR prediction to be adapted to refine its performance.In this work,a hybrid technique was proposed for classification and prediction of DR.The proposed hybrid technique consists of Ensemble Learning(EL),2 Dimensional-Conventional Neural Network(2D-CNN),Transfer Learning(TL)and Correlation method.Initially,the Stochastic Gradient Boosting(SGB)EL method was used to predict the DR.Secondly,the boosting based EL method was used to predict the DR of images.Thirdly 2D-CNN was applied to categorize the various stages of DR images.Finally,the TL was adopted to transfer the clas-sification prediction to training datasets.When this TL was applied,a new predic-tion feature was increased.From the experiment,the proposed technique has achieved 97.8%of accuracy in prophecies of DR images and 98%accuracy in grading of images.The experiment was also extended to measure the sensitivity(99.6%)and specificity(97.3%)metrics.The predicted accuracy rate was com-pared with existing methods.
基金supported by the National Natural Science Foundation of China(No.61772231)the Natural Science Foundation of Shandong Province(No.ZR2022LZH016&No.ZR2017MF025)+3 种基金the Project of Shandong Provincial Social Science Program(No.18CHLJ39)the Shandong Provincial Key R&D Program of China(No.2021CXGC010103)the Shandong Provincial Teaching Research Project of Graduate Education(No.SDYAL2022102&No.SDYJG21034)the Teaching Research Project of University of Jinan(No.JZ2212)。
文摘Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features.They focus on the analysis of the learning score and check result of online course,but neglect the phased student behaviors.Besides,the status of student participation at a given moment is necessarily impacted by the prior status of learning.To address these issues,this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP)that integrates attention-based document representation as a vector(A-Doc2vec),feature learning of course difficulty,and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC).First,A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments.It also captures the relationship between courses and videos.Then,a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction.Finally,WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging.Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy,Precision,Recall,and F1.
基金supported by Ningxia Key R&D Program (Key)Project (2023BDE02001)Ningxia Key R&D Program (Talent Introduction Special)Project (2022YCZX0013)+2 种基金North Minzu University 2022 School-Level Research Platform“Digital Agriculture Empowering Ningxia Rural Revitalization Innovation Team”,Project Number:2022PT_S10Yinchuan City School-Enterprise Joint Innovation Project (2022XQZD009)“Innovation Team for Imaging and Intelligent Information Processing”of the National Ethnic Affairs Commission.
文摘Widely used deep neural networks currently face limitations in achieving optimal performance for purchase intention prediction due to constraints on data volume and hyperparameter selection.To address this issue,based on the deep forest algorithm and further integrating evolutionary ensemble learning methods,this paper proposes a novel Deep Adaptive Evolutionary Ensemble(DAEE)model.This model introduces model diversity into the cascade layer,allowing it to adaptively adjust its structure to accommodate complex and evolving purchasing behavior patterns.Moreover,this paper optimizes the methods of obtaining feature vectors,enhancement vectors,and prediction results within the deep forest algorithm to enhance the model’s predictive accuracy.Results demonstrate that the improved deep forest model not only possesses higher robustness but also shows an increase of 5.02%in AUC value compared to the baseline model.Furthermore,its training runtime speed is 6 times faster than that of deep models,and compared to other improved models,its accuracy has been enhanced by 0.9%.
基金supported by China Southern Power Grid Science and Technology Innovation Research Project(000000KK52220052).
文摘The rapid growth of the Chinese economy has fueled the expansion of power grids.Power transformers are key equipment in power grid projects,and their price changes have a significant impact on cost control.However,the prices of power transformer materials manifest as nonsmooth and nonlinear sequences.Hence,estimating the acquisition costs of power grid projects is difficult,hindering the normal operation of power engineering construction.To more accurately predict the price of power transformer materials,this study proposes a method based on complementary ensemble empirical mode decomposition(CEEMD)and gated recurrent unit(GRU)network.First,the CEEMD decomposed the price series into multiple intrinsic mode functions(IMFs).Multiple IMFs were clustered to obtain several aggregated sequences based on the sample entropy of each IMF.Then,an empirical wavelet transform(EWT)was applied to the aggregation sequence with a large sample entropy,and the multiple subsequences obtained from the decomposition were predicted by the GRU model.The GRU model was used to directly predict the aggregation sequences with a small sample entropy.In this study,we used authentic historical pricing data for power transformer materials to validate the proposed approach.The empirical findings demonstrated the efficacy of our method across both datasets,with mean absolute percentage errors(MAPEs)of less than 1%and 3%.This approach holds a significant reference value for future research in the field of power transformer material price prediction.
基金funded by Scientific Research Project of Guangxi Normal University of Science and Technology,grant number GXKS2022QN024.
文摘Precipitation is a significant index to measure the degree of drought and flood in a region,which directly reflects the local natural changes and ecological environment.It is very important to grasp the change characteristics and law of precipitation accurately for effectively reducing disaster loss and maintaining the stable development of a social economy.In order to accurately predict precipitation,a new precipitation prediction model based on extreme learning machine ensemble(ELME)is proposed.The integrated model is based on the extreme learning machine(ELM)with different kernel functions and supporting parameters,and the submodel with the minimum root mean square error(RMSE)is found to fit the test data.Due to the complex mechanism and factors affecting precipitation change,the data have strong uncertainty and significant nonlinear variation characteristics.The mean generating function(MGF)is used to generate the continuation factor matrix,and the principal component analysis technique is employed to reduce the dimension of the continuation matrix,and the effective data features are extracted.Finally,the ELME prediction model is established by using the precipitation data of Liuzhou city from 1951 to 2021 in June,July and August,and a comparative experiment is carried out by using ELM,long-term and short-term memory neural network(LSTM)and back propagation neural network based on genetic algorithm(GA-BP).The experimental results show that the prediction accuracy of the proposed method is significantly higher than that of other models,and it has high stability and reliability,which provides a reliable method for precipitation prediction.
文摘When a customer uses the software, then it is possible to occur defects that can be removed in the updated versions of the software. Hence, in the present work, a robust examination of cross-project software defect prediction is elaborated through an innovative hybrid machine learning framework. The proposed technique combines an advanced deep neural network architecture with ensemble models such as Support Vector Machine (SVM), Random Forest (RF), and XGBoost. The study evaluates the performance by considering multiple software projects like CM1, JM1, KC1, and PC1 using datasets from the PROMISE Software Engineering Repository. The three hybrid models that are compared are Hybrid Model-1 (SVM, RandomForest, XGBoost, Neural Network), Hybrid Model-2 (GradientBoosting, DecisionTree, LogisticRegression, Neural Network), and Hybrid Model-3 (KNeighbors, GaussianNB, Support Vector Classification (SVC), Neural Network), and the Hybrid Model 3 surpasses the others in terms of recall, F1-score, accuracy, ROC AUC, and precision. The presented work offers valuable insights into the effectiveness of hybrid techniques for cross-project defect prediction, providing a comparative perspective on early defect identification and mitigation strategies. .
基金The National Nat-ural Science Foundation of China (NSFC), Grant Nos.90711003, 40375014the program of GYHY200706005, and the APCC Visiting Scientist Program jointly supportedthis work.
文摘The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific (Asia-Pacific) summer monsoon region (0°-50°N, 100° 150°E) were evaluated in nine different AGCM, used in the Asia-Pacific Economic Cooperation Climate Center (APCC) multi-model ensemble seasonal prediction system. The analysis indicates that the precipitation anomaly patterns of model ensemble predictions are substantially different from the observed counterparts in this region, but the summer monsoon circulations are reasonably predicted. For example, all models can well produce the interannual variability of the western North Pacific monsoon index (WNPMI) defined by 850 hPa winds, but they failed to predict the relationship between WNPMI and precipitation anomalies. The interannual variability of the 500 hPa geopotential height (GPH) can be well predicted by the models in contrast to precipitation anomalies. On the basis of such model performances and the relationship between the interannual variations of 500 hPa GPH and precipitation anomalies, we developed a statistical scheme used to downscale the summer monsoon precipitation anomaly on the basis of EOF and singular value decomposition (SVD). In this scheme, the three leading EOF modes of 500 hPa GPH anomaly fields predicted by the models are firstly corrected by the linear regression between the principal components in each model and observation, respectively. Then, the corrected model GPH is chosen as the predictor to downscale the precipitation anomaly field, which is assembled by the forecasted expansion coefficients of model 500 hPa GPH and the three leading SVD modes of observed precipitation anomaly corresponding to the prediction of model 500 hPa GPH during a 19-year training period. The cross-validated forecasts suggest that this downscaling scheme may have a potential to improve the forecast skill of the precipitation anomaly in the South China Sea, western North Pacific and the East Asia Pacific regions, where the anomaly correlation coefficient (ACC) has been improved by 0.14, corresponding to the reduced RMSE of 10.4% in the conventional multi-model ensemble (MME) forecast.
基金supported by the National Natural Science Foundation of China (Grant No. 91437113)the Special Fund for Meteorological Scientific Research in the Public Interest (Grant Nos. GYHY201506007 and GYHY201006015)+1 种基金the National 973 Program of China (Grant Nos. 2012CB417204 and 2012CB955200)the Scientific Research & Innovation Projects for Academic Degree Students of Ordinary Universities of Jiangsu (Grant No. KYLX 0827)
文摘An initial conditions (ICs) perturbation method was developed with the aim to improve an operational regional ensemble prediction system (REPS). Three issues were identified and investigated: (1) the impacts of perturbation scale on the ensemble spread and forecast skill of the REPS; (2) the scale characteristic of the IC perturbations of the REPS; and (3) whether the REPS's skill could be improved by adding large-scale information to the IC perturbations. Numerical experiments were conducted to reveal the impact of perturbation scale on the ensemble spread and forecast skill. The scales of IC perturbations from the REPS and an operational global ensemble prediction system (GEPS) were analyzed. A "multi-scale blending" (MSB) IC perturbation scheme was developed, and the main findings can be summarized as follows: The growth rates of the ensemble spread of the REPS are sensitive to the scale of the IC perturbations; the ensemble forecast skills can benefit from large-scale perturbations; the global ensemble IC perturbations exhibit more power at larger scales, while the regional ensemble IC perturbations contain more power at smaller scales; the MSB method can generate IC perturbations by combining the small-scale component from the REPS and the large-scale component from the GEPS; the energy norm growth of the MSB-generated perturbations can be appropriate at all forecast lead times; and the MSB-based REPS shows higher skill than the original system, as determined by ensemble forecast verification.