Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article...Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article presentsa novel approach for hybrid ensemble learning that is based on rigorous requirements engineering concepts.The approach finds significant parameters influencing forecasting accuracy by evaluating real-time Modern-EraRetrospective Analysis for Research and Applications (MERRA2) data from several European Wind farms usingin-depth stakeholder research and requirements elicitation. Ensemble learning is used to develop a robust model,while a temporal convolutional network handles time-series complexities and data gaps. The ensemble-temporalneural network is enhanced by providing different input parameters including training layers, hidden and dropoutlayers along with activation and loss functions. The proposed framework is further analyzed by comparing stateof-the-art forecasting models in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE),respectively. The energy efficiency performance indicators showed that the proposed model demonstrates errorreduction percentages of approximately 16.67%, 28.57%, and 81.92% for MAE, and 38.46%, 17.65%, and 90.78%for RMSE for MERRAWind farms 1, 2, and 3, respectively, compared to other existingmethods. These quantitativeresults show the effectiveness of our proposed model with MAE values ranging from 0.0010 to 0.0156 and RMSEvalues ranging from 0.0014 to 0.0174. This work highlights the effectiveness of requirements engineering in windpower forecasting, leading to enhanced forecast accuracy and grid stability, ultimately paving the way for moresustainable energy solutions.展开更多
Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recogn...Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recognition System(TSRS)is one of themost important components ofADAS.Among the challengeswith TSRS is being able to recognize road signs with the highest accuracy and the shortest processing time.Accordingly,this paper introduces a new real time methodology recognizing Speed Limit Signs based on a trio of developed modules.Firstly,the Speed Limit Detection(SLD)module uses the Haar Cascade technique to generate a new SL detector in order to localize SL signs within captured frames.Secondly,the Speed Limit Classification(SLC)module,featuring machine learning classifiers alongside a newly developed model called DeepSL,harnesses the power of a CNN architecture to extract intricate features from speed limit sign images,ensuring efficient and precise recognition.In addition,a new Speed Limit Classifiers Fusion(SLCF)module has been developed by combining trained ML classifiers and the DeepSL model by using the Dempster-Shafer theory of belief functions and ensemble learning’s voting technique.Through rigorous software and hardware validation processes,the proposedmethodology has achieved highly significant F1 scores of 99.98%and 99.96%for DS theory and the votingmethod,respectively.Furthermore,a prototype encompassing all components demonstrates outstanding reliability and efficacy,with processing times of 150 ms for the Raspberry Pi board and 81.5 ms for the Nano Jetson board,marking a significant advancement in TSRS technology.展开更多
This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols w...This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols with a well-balanced rhythm and suitable layout.The model,called Stacked Generalization for Channel Estimation(SGCE),aims to enhance channel estimation performance by eliminating pilot insertion and improving throughput.The SGCE model incorporates six machine learning methods:random forest(RF),gradient boosting machine(GB),light gradient boosting machine(LGBM),support vector regression(SVR),extremely randomized tree(ERT),and extreme gradient boosting(XGB).By generating meta-data from five models(RF,GB,LGBM,SVR,and ERT),we ensure accurate channel coefficient predictions using the XGB model.To validate themodeling performance,we employ the leave-one-out cross-validation(LOOCV)approach,where each observation serves as the validation set while the remaining observations act as the training set.SGCE performances’results demonstrate higher mean andmedian accuracy compared to the separatedmodel.SGCE achieves an average accuracy of 98.4%,precision of 98.1%,and the highest F1-score of 98.5%,accurately predicting channel coefficients.Furthermore,our proposedmethod outperforms prior traditional and intelligent techniques in terms of throughput and bit error rate.SGCE’s superior performance highlights its efficacy in optimizing channel estimation.It can effectively predict channel coefficients and contribute to enhancing the overall efficiency of radio mobile systems.Through extensive experimentation and evaluation,we demonstrate that SGCE improved performance in channel estimation,surpassing previous techniques.Accordingly,SGCE’s capabilities have significant implications for optimizing channel estimation in modern communication systems.展开更多
The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human re...The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.展开更多
As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic...As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic power plants.This study proposes an integrated deep learning-based photovoltaic resource assessment method.Ensemble learning and deep learning methods are fused for photovoltaic resource assessment for the first time.The proposed method combines the random forest,gated recurrent unit,and long short-term memory to effectively improve the accuracy and reliability of photovoltaic resource assessment.The proposed method has strong adaptability and high accuracy even in the photovoltaic resource assessment of complex terrain and landscape.The experimental results show that the proposed method outperforms the comparison algorithm in all evaluation indexes,indicating that the proposed method has higher accuracy and reliability in photovoltaic resource assessment with improved generalization performance traditional single algorithm.展开更多
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield base...The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.展开更多
In recent years,cervical cancer is one of the most common diseases which occur in any woman regardless of any age.This is the deadliest disease since there were no symptoms shown till it is diagnosed to be the last st...In recent years,cervical cancer is one of the most common diseases which occur in any woman regardless of any age.This is the deadliest disease since there were no symptoms shown till it is diagnosed to be the last stage.For women at a certain age,it is better to have a proper screening for cervical can-cer.In most underdeveloped nations,it is very difficult to have frequent scanning for cervical cancer.Data Mining and machine learning methodologies help widely infinding the important causes for cervical cancer.The proposed work describes a multi-class classification approach is implemented for the dataset using Support Vector Machine(SVM)and the perception learning method.It is known that most classification algorithms are designed for solving binary classification problems.From a heuristic approach,the problem is addressed as a multiclass classification problem.A Gradient Boosting Machine(GBM)is also used in implementation in order to increase the classifier accuracy.The proposed model is evaluated in terms of accuracy,sensitivity and found that this model works well in identifying the risk factors of cervical cancer.展开更多
The back propagation(BP)neural network method is widely used in bathymetry based on multispectral satellite imagery.However,the classical BP neural network method faces a potential problem because it easily falls into...The back propagation(BP)neural network method is widely used in bathymetry based on multispectral satellite imagery.However,the classical BP neural network method faces a potential problem because it easily falls into a local minimum,leading to model training failure.This study confirmed that the local minimum problem of the BP neural network method exists in the bathymetry field and cannot be ignored.Furthermore,to solve the local minimum problem of the BP neural network method,a bathymetry method based on a BP neural network and ensemble learning(BPEL)is proposed.First,the remote sensing imagery and training sample were used as input datasets,and the BP method was used as the base learner to produce multiple water depth inversion results.Then,a new ensemble strategy,namely the minimum outlying degree method,was proposed and used to integrate the water depth inversion results.Finally,an ensemble bathymetric map was acquired.Anda Reef,northeastern Jiuzhang Atoll,and Pingtan coastal zone were selected as test cases to validate the proposed method.Compared with the BP neural network method,the root-mean-square error and the average relative error of the BPEL method can reduce by 0.65–2.84 m and 16%–46%in the three test cases at most.The results showed that the proposed BPEL method could solve the local minimum problem of the BP neural network method and obtain highly robust and accurate bathymetric maps.展开更多
Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classif...Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classification becomes a crucial topic which helps to categorize waste into hazardous or non-hazardous ones and thereby assist in the decision making of the waste management process.This study concentrates on the design of hazardous waste detection and classification using ensemble learning(HWDC-EL)technique to reduce toxicity and improve human health.The goal of the HWDC-EL technique is to detect the multiple classes of wastes,particularly hazardous and non-hazardous wastes.The HWDC-EL technique involves the ensemble of three feature extractors using Model Averaging technique namely discrete local binary patterns(DLBP),EfficientNet,and DenseNet121.In addition,the flower pollination algorithm(FPA)based hyperparameter optimizers are used to optimally adjust the parameters involved in the EfficientNet and DenseNet121 models.Moreover,a weighted voting-based ensemble classifier is derived using three machine learning algorithms namely support vector machine(SVM),extreme learning machine(ELM),and gradient boosting tree(GBT).The performance of the HWDC-EL technique is tested using a benchmark Garbage dataset and it obtains a maximum accuracy of 98.85%.展开更多
Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performa...Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.展开更多
The emergence of deep fake videos in recent years has made image falsification a real danger.A person’s face and emotions are deep-faked in a video or speech and are substituted with a different face or voice employi...The emergence of deep fake videos in recent years has made image falsification a real danger.A person’s face and emotions are deep-faked in a video or speech and are substituted with a different face or voice employing deep learning to analyze speech or emotional content.Because of how clever these videos are frequently,Manipulation is challenging to spot.Social media are the most frequent and dangerous targets since they are weak outlets that are open to extortion or slander a human.In earlier times,it was not so easy to alter the videos,which required expertise in the domain and time.Nowadays,the generation of fake videos has become easier and with a high level of realism in the video.Deepfakes are forgeries and altered visual data that appear in still photos or video footage.Numerous automatic identification systems have been developed to solve this issue,however they are constrained to certain datasets and performpoorly when applied to different datasets.This study aims to develop an ensemble learning model utilizing a convolutional neural network(CNN)to handle deepfakes or Face2Face.We employed ensemble learning,a technique combining many classifiers to achieve higher prediction performance than a single classifier,boosting themodel’s accuracy.The performance of the generated model is evaluated on Face Forensics.This work is about building a new powerful model for automatically identifying deep fake videos with the DeepFake-Detection-Challenges(DFDC)dataset.We test our model using the DFDC,one of the most difficult datasets and get an accuracy of 96%.展开更多
The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of m...The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of malware by criminals also incorporates a number of sophisticated obfuscation techniques,making it difficult to classify and detect malware using conventional approaches.Therefore,this paper proposes a novel visualization-based malware classification system using transfer and ensemble learning(VMCTE).VMCTE has a strong anti-interference ability.Even if malware uses obfuscation,fuzzing,encryption,and other techniques to evade detection,it can be accurately classified into its corresponding malware family.Unlike traditional dynamic and static analysis techniques,VMCTE does not require either reverse engineering or the aid of domain expert knowledge.The proposed classification system combines three strong deep convolutional neural networks(ResNet50,MobilenetV1,and MobilenetV2)as feature extractors,lessens the dimension of the extracted features using principal component analysis,and employs a support vector machine to establish the classification model.The semantic representations of malware images can be extracted using various convolutional neural network(CNN)architectures,obtaining higher-quality features than traditional methods.Integrating fine-tuned and non-fine-tuned classification models based on transfer learning can greatly enhance the capacity to classify various families ofmalware.The experimental findings on the Malimg dataset demonstrate that VMCTE can attain 99.64%,99.64%,99.66%,and 99.64%accuracy,F1-score,precision,and recall,respectively.展开更多
Nowadays,IT systems rely mainly on artificial intelligence(AI)algorithms to process data.AI is generally used to extract knowledge from stored information and,depending on the nature of data,it may be necessary to app...Nowadays,IT systems rely mainly on artificial intelligence(AI)algorithms to process data.AI is generally used to extract knowledge from stored information and,depending on the nature of data,it may be necessary to apply different AI algorithms.In this article,a novel perspective on the use of AI to ensure the cybersecurity through the study of network traffic is presented.This is done through the construction of a two-stage cyberattack classification ensemble model addressing class imbalance following a one-vs-rest(OvR)approach.With the growing trend of cyberattacks,it is essential to implement techniques that ensure legitimate access to information.To address this issue,this work proposes a network traffic classification system for different categories based on several AI techniques.In the first task,binary models are generated to clearly differentiate each type of traffic from the rest.With binary models generated,an ensemble model is developed in two phases,which allows the separation of legitimate and illegitimate traffic(phase 1)while also identifying the type of illegitimate traffic(phase 2).In this way,the proposed system allows a complete multiclass classification of network traffic.The estimation of global performance is done using a modern dataset(UNSW-NB15),evaluated using two approaches and compared with other state-of-art works.Our proposal,based on the construction of a two-step model,reaches an F1 of 0.912 for the first level of binary classification and 0.7754 for the multiclass classification.These results show that the proposed system outperforms other state-of-the-art approaches(+0.75%and+3.54%for binary and multiclass classification,respectively)in terms of F1,as demon-strated through comparison together with other relevant classification metrics.展开更多
Cardiotocography(CTG)represents the fetus’s health inside the womb during labor.However,assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician.Digital signals fro...Cardiotocography(CTG)represents the fetus’s health inside the womb during labor.However,assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician.Digital signals from fetal monitors acquire parameters(i.e.,fetal heart rate,contractions,acceleration).Objective:This paper aims to classify the CTG readings containing imbalanced healthy,suspected,and pathological fetus readings.Method:We perform two sets of experiments.Firstly,we employ five classifiers:Random Forest(RF),Adaptive Boosting(AdaBoost),Categorical Boosting(CatBoost),Extreme Gradient Boosting(XGBoost),and Light Gradient Boosting Machine(LGBM)without over-sampling to classify CTG readings into three categories:healthy,suspected,and pathological.Secondly,we employ an ensemble of the above-described classifiers with the oversamplingmethod.We use a random over-sampling technique to balance CTG records to train the ensemble models.We use 3602 CTG readings to train the ensemble classifiers and 1201 records to evaluate them.The outcomes of these classifiers are then fed into the soft voting classifier to obtain the most accurate results.Results:Each classifier evaluates accuracy,Precision,Recall,F1-scores,and Area Under the Receiver Operating Curve(AUROC)values.Results reveal that the XGBoost,LGBM,and CatBoost classifiers yielded 99%accuracy.Conclusion:Using ensemble classifiers over a balanced CTG dataset improves the detection accuracy compared to the previous studies and our first experiment.A soft voting classifier then eliminates the weakness of one individual classifier to yield superior performance of the overall model.展开更多
Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized m...Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.展开更多
Diabetic Eye Disease(DED)is a fundamental cause of blindness in human beings in the medical world.Different techniques are proposed to forecast and examine the stages in Prognostication of Diabetic Retinopathy(DR).The...Diabetic Eye Disease(DED)is a fundamental cause of blindness in human beings in the medical world.Different techniques are proposed to forecast and examine the stages in Prognostication of Diabetic Retinopathy(DR).The Machine Learning(ML)and the Deep Learning(DL)algorithms are the predomi-nant techniques to project and explore the images of DR.Even though some solu-tions were adapted to challenge the cause of DR disease,still there should be an efficient and accurate DR prediction to be adapted to refine its performance.In this work,a hybrid technique was proposed for classification and prediction of DR.The proposed hybrid technique consists of Ensemble Learning(EL),2 Dimensional-Conventional Neural Network(2D-CNN),Transfer Learning(TL)and Correlation method.Initially,the Stochastic Gradient Boosting(SGB)EL method was used to predict the DR.Secondly,the boosting based EL method was used to predict the DR of images.Thirdly 2D-CNN was applied to categorize the various stages of DR images.Finally,the TL was adopted to transfer the clas-sification prediction to training datasets.When this TL was applied,a new predic-tion feature was increased.From the experiment,the proposed technique has achieved 97.8%of accuracy in prophecies of DR images and 98%accuracy in grading of images.The experiment was also extended to measure the sensitivity(99.6%)and specificity(97.3%)metrics.The predicted accuracy rate was com-pared with existing methods.展开更多
Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.M...Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features.They focus on the analysis of the learning score and check result of online course,but neglect the phased student behaviors.Besides,the status of student participation at a given moment is necessarily impacted by the prior status of learning.To address these issues,this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP)that integrates attention-based document representation as a vector(A-Doc2vec),feature learning of course difficulty,and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC).First,A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments.It also captures the relationship between courses and videos.Then,a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction.Finally,WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging.Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy,Precision,Recall,and F1.展开更多
Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradien...Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.展开更多
As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single mac...As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
文摘Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article presentsa novel approach for hybrid ensemble learning that is based on rigorous requirements engineering concepts.The approach finds significant parameters influencing forecasting accuracy by evaluating real-time Modern-EraRetrospective Analysis for Research and Applications (MERRA2) data from several European Wind farms usingin-depth stakeholder research and requirements elicitation. Ensemble learning is used to develop a robust model,while a temporal convolutional network handles time-series complexities and data gaps. The ensemble-temporalneural network is enhanced by providing different input parameters including training layers, hidden and dropoutlayers along with activation and loss functions. The proposed framework is further analyzed by comparing stateof-the-art forecasting models in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE),respectively. The energy efficiency performance indicators showed that the proposed model demonstrates errorreduction percentages of approximately 16.67%, 28.57%, and 81.92% for MAE, and 38.46%, 17.65%, and 90.78%for RMSE for MERRAWind farms 1, 2, and 3, respectively, compared to other existingmethods. These quantitativeresults show the effectiveness of our proposed model with MAE values ranging from 0.0010 to 0.0156 and RMSEvalues ranging from 0.0014 to 0.0174. This work highlights the effectiveness of requirements engineering in windpower forecasting, leading to enhanced forecast accuracy and grid stability, ultimately paving the way for moresustainable energy solutions.
文摘Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recognition System(TSRS)is one of themost important components ofADAS.Among the challengeswith TSRS is being able to recognize road signs with the highest accuracy and the shortest processing time.Accordingly,this paper introduces a new real time methodology recognizing Speed Limit Signs based on a trio of developed modules.Firstly,the Speed Limit Detection(SLD)module uses the Haar Cascade technique to generate a new SL detector in order to localize SL signs within captured frames.Secondly,the Speed Limit Classification(SLC)module,featuring machine learning classifiers alongside a newly developed model called DeepSL,harnesses the power of a CNN architecture to extract intricate features from speed limit sign images,ensuring efficient and precise recognition.In addition,a new Speed Limit Classifiers Fusion(SLCF)module has been developed by combining trained ML classifiers and the DeepSL model by using the Dempster-Shafer theory of belief functions and ensemble learning’s voting technique.Through rigorous software and hardware validation processes,the proposedmethodology has achieved highly significant F1 scores of 99.98%and 99.96%for DS theory and the votingmethod,respectively.Furthermore,a prototype encompassing all components demonstrates outstanding reliability and efficacy,with processing times of 150 ms for the Raspberry Pi board and 81.5 ms for the Nano Jetson board,marking a significant advancement in TSRS technology.
基金This research project was funded by the Deanship of Scientific Research,Princess Nourah bint Abdulrahman University,through the Program of Research Project Funding After Publication,grant No(43-PRFA-P-58).
文摘This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols with a well-balanced rhythm and suitable layout.The model,called Stacked Generalization for Channel Estimation(SGCE),aims to enhance channel estimation performance by eliminating pilot insertion and improving throughput.The SGCE model incorporates six machine learning methods:random forest(RF),gradient boosting machine(GB),light gradient boosting machine(LGBM),support vector regression(SVR),extremely randomized tree(ERT),and extreme gradient boosting(XGB).By generating meta-data from five models(RF,GB,LGBM,SVR,and ERT),we ensure accurate channel coefficient predictions using the XGB model.To validate themodeling performance,we employ the leave-one-out cross-validation(LOOCV)approach,where each observation serves as the validation set while the remaining observations act as the training set.SGCE performances’results demonstrate higher mean andmedian accuracy compared to the separatedmodel.SGCE achieves an average accuracy of 98.4%,precision of 98.1%,and the highest F1-score of 98.5%,accurately predicting channel coefficients.Furthermore,our proposedmethod outperforms prior traditional and intelligent techniques in terms of throughput and bit error rate.SGCE’s superior performance highlights its efficacy in optimizing channel estimation.It can effectively predict channel coefficients and contribute to enhancing the overall efficiency of radio mobile systems.Through extensive experimentation and evaluation,we demonstrate that SGCE improved performance in channel estimation,surpassing previous techniques.Accordingly,SGCE’s capabilities have significant implications for optimizing channel estimation in modern communication systems.
基金This work is supported by EIAS(Emerging Intelligent Autonomous Systems)Data Science Lab,Prince Sultan University,Kingdom of Saudi Arabia,by paying the APC.
文摘The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.
基金funded by Key-Area Research and Development Program Project of Guangdong Province (2021B0101230003)China Southern Power Grid Science and Technology Project (ZBKJXM20220004).
文摘As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic power plants.This study proposes an integrated deep learning-based photovoltaic resource assessment method.Ensemble learning and deep learning methods are fused for photovoltaic resource assessment for the first time.The proposed method combines the random forest,gated recurrent unit,and long short-term memory to effectively improve the accuracy and reliability of photovoltaic resource assessment.The proposed method has strong adaptability and high accuracy even in the photovoltaic resource assessment of complex terrain and landscape.The experimental results show that the proposed method outperforms the comparison algorithm in all evaluation indexes,indicating that the proposed method has higher accuracy and reliability in photovoltaic resource assessment with improved generalization performance traditional single algorithm.
基金supported by the Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII)。
文摘The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.
文摘In recent years,cervical cancer is one of the most common diseases which occur in any woman regardless of any age.This is the deadliest disease since there were no symptoms shown till it is diagnosed to be the last stage.For women at a certain age,it is better to have a proper screening for cervical can-cer.In most underdeveloped nations,it is very difficult to have frequent scanning for cervical cancer.Data Mining and machine learning methodologies help widely infinding the important causes for cervical cancer.The proposed work describes a multi-class classification approach is implemented for the dataset using Support Vector Machine(SVM)and the perception learning method.It is known that most classification algorithms are designed for solving binary classification problems.From a heuristic approach,the problem is addressed as a multiclass classification problem.A Gradient Boosting Machine(GBM)is also used in implementation in order to increase the classifier accuracy.The proposed model is evaluated in terms of accuracy,sensitivity and found that this model works well in identifying the risk factors of cervical cancer.
基金The National Natural Science Foundation of China under contract No.42001401the China Postdoctoral Science Foundation under contract No.2020M671431+1 种基金the Fundamental Research Funds for the Central Universities under contract No.0209-14380096the Guangxi Innovative Development Grand Grant under contract No.2018AA13005.
文摘The back propagation(BP)neural network method is widely used in bathymetry based on multispectral satellite imagery.However,the classical BP neural network method faces a potential problem because it easily falls into a local minimum,leading to model training failure.This study confirmed that the local minimum problem of the BP neural network method exists in the bathymetry field and cannot be ignored.Furthermore,to solve the local minimum problem of the BP neural network method,a bathymetry method based on a BP neural network and ensemble learning(BPEL)is proposed.First,the remote sensing imagery and training sample were used as input datasets,and the BP method was used as the base learner to produce multiple water depth inversion results.Then,a new ensemble strategy,namely the minimum outlying degree method,was proposed and used to integrate the water depth inversion results.Finally,an ensemble bathymetric map was acquired.Anda Reef,northeastern Jiuzhang Atoll,and Pingtan coastal zone were selected as test cases to validate the proposed method.Compared with the BP neural network method,the root-mean-square error and the average relative error of the BPEL method can reduce by 0.65–2.84 m and 16%–46%in the three test cases at most.The results showed that the proposed BPEL method could solve the local minimum problem of the BP neural network method and obtain highly robust and accurate bathymetric maps.
基金the Deanship of Scientific Research at King Khalid University for funding this work underGrant Number(RGP 2/209/42)PrincessNourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R136)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4210118DSR27).
文摘Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classification becomes a crucial topic which helps to categorize waste into hazardous or non-hazardous ones and thereby assist in the decision making of the waste management process.This study concentrates on the design of hazardous waste detection and classification using ensemble learning(HWDC-EL)technique to reduce toxicity and improve human health.The goal of the HWDC-EL technique is to detect the multiple classes of wastes,particularly hazardous and non-hazardous wastes.The HWDC-EL technique involves the ensemble of three feature extractors using Model Averaging technique namely discrete local binary patterns(DLBP),EfficientNet,and DenseNet121.In addition,the flower pollination algorithm(FPA)based hyperparameter optimizers are used to optimally adjust the parameters involved in the EfficientNet and DenseNet121 models.Moreover,a weighted voting-based ensemble classifier is derived using three machine learning algorithms namely support vector machine(SVM),extreme learning machine(ELM),and gradient boosting tree(GBT).The performance of the HWDC-EL technique is tested using a benchmark Garbage dataset and it obtains a maximum accuracy of 98.85%.
基金supported by the National Key R&D Program of China(No.2021YFC2100100)the National Natural Science Foundation of China(No.21901157)the Shanghai Science and Technology Project(No.21JC1403400)。
文摘Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.
文摘The emergence of deep fake videos in recent years has made image falsification a real danger.A person’s face and emotions are deep-faked in a video or speech and are substituted with a different face or voice employing deep learning to analyze speech or emotional content.Because of how clever these videos are frequently,Manipulation is challenging to spot.Social media are the most frequent and dangerous targets since they are weak outlets that are open to extortion or slander a human.In earlier times,it was not so easy to alter the videos,which required expertise in the domain and time.Nowadays,the generation of fake videos has become easier and with a high level of realism in the video.Deepfakes are forgeries and altered visual data that appear in still photos or video footage.Numerous automatic identification systems have been developed to solve this issue,however they are constrained to certain datasets and performpoorly when applied to different datasets.This study aims to develop an ensemble learning model utilizing a convolutional neural network(CNN)to handle deepfakes or Face2Face.We employed ensemble learning,a technique combining many classifiers to achieve higher prediction performance than a single classifier,boosting themodel’s accuracy.The performance of the generated model is evaluated on Face Forensics.This work is about building a new powerful model for automatically identifying deep fake videos with the DeepFake-Detection-Challenges(DFDC)dataset.We test our model using the DFDC,one of the most difficult datasets and get an accuracy of 96%.
基金This work is supported,in part,by the National Natural Science Foundation of China Grant No.62102190 and 62272236in part,by the Natural Science Foundation of Jiangsu Province under Grant No.BK20201136 and BK20191401.
文摘The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of malware by criminals also incorporates a number of sophisticated obfuscation techniques,making it difficult to classify and detect malware using conventional approaches.Therefore,this paper proposes a novel visualization-based malware classification system using transfer and ensemble learning(VMCTE).VMCTE has a strong anti-interference ability.Even if malware uses obfuscation,fuzzing,encryption,and other techniques to evade detection,it can be accurately classified into its corresponding malware family.Unlike traditional dynamic and static analysis techniques,VMCTE does not require either reverse engineering or the aid of domain expert knowledge.The proposed classification system combines three strong deep convolutional neural networks(ResNet50,MobilenetV1,and MobilenetV2)as feature extractors,lessens the dimension of the extracted features using principal component analysis,and employs a support vector machine to establish the classification model.The semantic representations of malware images can be extracted using various convolutional neural network(CNN)architectures,obtaining higher-quality features than traditional methods.Integrating fine-tuned and non-fine-tuned classification models based on transfer learning can greatly enhance the capacity to classify various families ofmalware.The experimental findings on the Malimg dataset demonstrate that VMCTE can attain 99.64%,99.64%,99.66%,and 99.64%accuracy,F1-score,precision,and recall,respectively.
基金supported by the Junta de Extremadura (European Regional Development Fund),Consejería de Economía,Ciencia y Agenda Digital,under Project GR21099.
文摘Nowadays,IT systems rely mainly on artificial intelligence(AI)algorithms to process data.AI is generally used to extract knowledge from stored information and,depending on the nature of data,it may be necessary to apply different AI algorithms.In this article,a novel perspective on the use of AI to ensure the cybersecurity through the study of network traffic is presented.This is done through the construction of a two-stage cyberattack classification ensemble model addressing class imbalance following a one-vs-rest(OvR)approach.With the growing trend of cyberattacks,it is essential to implement techniques that ensure legitimate access to information.To address this issue,this work proposes a network traffic classification system for different categories based on several AI techniques.In the first task,binary models are generated to clearly differentiate each type of traffic from the rest.With binary models generated,an ensemble model is developed in two phases,which allows the separation of legitimate and illegitimate traffic(phase 1)while also identifying the type of illegitimate traffic(phase 2).In this way,the proposed system allows a complete multiclass classification of network traffic.The estimation of global performance is done using a modern dataset(UNSW-NB15),evaluated using two approaches and compared with other state-of-art works.Our proposal,based on the construction of a two-step model,reaches an F1 of 0.912 for the first level of binary classification and 0.7754 for the multiclass classification.These results show that the proposed system outperforms other state-of-the-art approaches(+0.75%and+3.54%for binary and multiclass classification,respectively)in terms of F1,as demon-strated through comparison together with other relevant classification metrics.
文摘Cardiotocography(CTG)represents the fetus’s health inside the womb during labor.However,assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician.Digital signals from fetal monitors acquire parameters(i.e.,fetal heart rate,contractions,acceleration).Objective:This paper aims to classify the CTG readings containing imbalanced healthy,suspected,and pathological fetus readings.Method:We perform two sets of experiments.Firstly,we employ five classifiers:Random Forest(RF),Adaptive Boosting(AdaBoost),Categorical Boosting(CatBoost),Extreme Gradient Boosting(XGBoost),and Light Gradient Boosting Machine(LGBM)without over-sampling to classify CTG readings into three categories:healthy,suspected,and pathological.Secondly,we employ an ensemble of the above-described classifiers with the oversamplingmethod.We use a random over-sampling technique to balance CTG records to train the ensemble models.We use 3602 CTG readings to train the ensemble classifiers and 1201 records to evaluate them.The outcomes of these classifiers are then fed into the soft voting classifier to obtain the most accurate results.Results:Each classifier evaluates accuracy,Precision,Recall,F1-scores,and Area Under the Receiver Operating Curve(AUROC)values.Results reveal that the XGBoost,LGBM,and CatBoost classifiers yielded 99%accuracy.Conclusion:Using ensemble classifiers over a balanced CTG dataset improves the detection accuracy compared to the previous studies and our first experiment.A soft voting classifier then eliminates the weakness of one individual classifier to yield superior performance of the overall model.
文摘Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.
文摘Diabetic Eye Disease(DED)is a fundamental cause of blindness in human beings in the medical world.Different techniques are proposed to forecast and examine the stages in Prognostication of Diabetic Retinopathy(DR).The Machine Learning(ML)and the Deep Learning(DL)algorithms are the predomi-nant techniques to project and explore the images of DR.Even though some solu-tions were adapted to challenge the cause of DR disease,still there should be an efficient and accurate DR prediction to be adapted to refine its performance.In this work,a hybrid technique was proposed for classification and prediction of DR.The proposed hybrid technique consists of Ensemble Learning(EL),2 Dimensional-Conventional Neural Network(2D-CNN),Transfer Learning(TL)and Correlation method.Initially,the Stochastic Gradient Boosting(SGB)EL method was used to predict the DR.Secondly,the boosting based EL method was used to predict the DR of images.Thirdly 2D-CNN was applied to categorize the various stages of DR images.Finally,the TL was adopted to transfer the clas-sification prediction to training datasets.When this TL was applied,a new predic-tion feature was increased.From the experiment,the proposed technique has achieved 97.8%of accuracy in prophecies of DR images and 98%accuracy in grading of images.The experiment was also extended to measure the sensitivity(99.6%)and specificity(97.3%)metrics.The predicted accuracy rate was com-pared with existing methods.
基金supported by the National Natural Science Foundation of China(No.61772231)the Natural Science Foundation of Shandong Province(No.ZR2022LZH016&No.ZR2017MF025)+3 种基金the Project of Shandong Provincial Social Science Program(No.18CHLJ39)the Shandong Provincial Key R&D Program of China(No.2021CXGC010103)the Shandong Provincial Teaching Research Project of Graduate Education(No.SDYAL2022102&No.SDYJG21034)the Teaching Research Project of University of Jinan(No.JZ2212)。
文摘Massive open online courses(MOOCs)have become a way of online learning across the world in the past few years.However,the extremely high dropout rate has brought many challenges to the development of online learning.Most of the current methods have low accuracy and poor generalization ability when dealing with high-dimensional dropout features.They focus on the analysis of the learning score and check result of online course,but neglect the phased student behaviors.Besides,the status of student participation at a given moment is necessarily impacted by the prior status of learning.To address these issues,this paper has proposed an ensemble learning model for early dropout prediction(ELM-EDP)that integrates attention-based document representation as a vector(A-Doc2vec),feature learning of course difficulty,and weighted soft voting ensemble with heterogeneous classifiers(WSV-HC).First,A-Doc2vec is proposed to learn sequence features of student behaviors of watching lecture videos and completing course assignments.It also captures the relationship between courses and videos.Then,a feature learning method is proposed to reduce the interference caused by the differences of course difficulty on the dropout prediction.Finally,WSV-HC is proposed to highlight the benefits of integration strategies of boosting and bagging.Experiments on the MOOCCube2020 dataset show that the high accuracy of our ELM-EDP has better results on Accuracy,Precision,Recall,and F1.
文摘Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.
文摘As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.