The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human re...The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.展开更多
The Indian Himalayan region is frequently experiencing climate change-induced landslides.Thus,landslide susceptibility assessment assumes greater significance for lessening the impact of a landslide hazard.This paper ...The Indian Himalayan region is frequently experiencing climate change-induced landslides.Thus,landslide susceptibility assessment assumes greater significance for lessening the impact of a landslide hazard.This paper makes an attempt to assess landslide susceptibility in Shimla district of the northwest Indian Himalayan region.It examined the effectiveness of random forest(RF),multilayer perceptron(MLP),sequential minimal optimization regression(SMOreg)and bagging ensemble(B-RF,BSMOreg,B-MLP)models.A landslide inventory map comprising 1052 locations of past landslide occurrences was classified into training(70%)and testing(30%)datasets.The site-specific influencing factors were selected by employing a multicollinearity test.The relationship between past landslide occurrences and influencing factors was established using the frequency ratio method.The effectiveness of machine learning models was verified through performance assessors.The landslide susceptibility maps were validated by the area under the receiver operating characteristic curves(ROC-AUC),accuracy,precision,recall and F1-score.The key performance metrics and map validation demonstrated that the BRF model(correlation coefficient:0.988,mean absolute error:0.010,root mean square error:0.058,relative absolute error:2.964,ROC-AUC:0.947,accuracy:0.778,precision:0.819,recall:0.917 and F-1 score:0.865)outperformed the single classifiers and other bagging ensemble models for landslide susceptibility.The results show that the largest area was found under the very high susceptibility zone(33.87%),followed by the low(27.30%),high(20.68%)and moderate(18.16%)susceptibility zones.The factors,namely average annual rainfall,slope,lithology,soil texture and earthquake magnitude have been identified as the influencing factors for very high landslide susceptibility.Soil texture,lineament density and elevation have been attributed to high and moderate susceptibility.Thus,the study calls for devising suitable landslide mitigation measures in the study area.Structural measures,an immediate response system,community participation and coordination among stakeholders may help lessen the detrimental impact of landslides.The findings from this study could aid decision-makers in mitigating future catastrophes and devising suitable strategies in other geographical regions with similar geological characteristics.展开更多
Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and ...Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and everpresent threat is Ransomware-as-a-Service(RaaS)assaults,which enable even individuals with minimal technical knowledge to conduct ransomware operations.This study provides a new approach for RaaS attack detection which uses an ensemble of deep learning models.For this purpose,the network intrusion detection dataset“UNSWNB15”from the Intelligent Security Group of the University of New South Wales,Australia is analyzed.In the initial phase,the rectified linear unit-,scaled exponential linear unit-,and exponential linear unit-based three separate Multi-Layer Perceptron(MLP)models are developed.Later,using the combined predictive power of these three MLPs,the RansoDetect Fusion ensemble model is introduced in the suggested methodology.The proposed ensemble technique outperforms previous studieswith impressive performance metrics results,including 98.79%accuracy and recall,98.85%precision,and 98.80%F1-score.The empirical results of this study validate the ensemble model’s ability to improve cybersecurity defenses by showing that it outperforms individual MLPmodels.In expanding the field of cybersecurity strategy,this research highlights the significance of combined deep learning models in strengthening intrusion detection systems against sophisticated cyber threats.展开更多
Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article...Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article presentsa novel approach for hybrid ensemble learning that is based on rigorous requirements engineering concepts.The approach finds significant parameters influencing forecasting accuracy by evaluating real-time Modern-EraRetrospective Analysis for Research and Applications (MERRA2) data from several European Wind farms usingin-depth stakeholder research and requirements elicitation. Ensemble learning is used to develop a robust model,while a temporal convolutional network handles time-series complexities and data gaps. The ensemble-temporalneural network is enhanced by providing different input parameters including training layers, hidden and dropoutlayers along with activation and loss functions. The proposed framework is further analyzed by comparing stateof-the-art forecasting models in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE),respectively. The energy efficiency performance indicators showed that the proposed model demonstrates errorreduction percentages of approximately 16.67%, 28.57%, and 81.92% for MAE, and 38.46%, 17.65%, and 90.78%for RMSE for MERRAWind farms 1, 2, and 3, respectively, compared to other existingmethods. These quantitativeresults show the effectiveness of our proposed model with MAE values ranging from 0.0010 to 0.0156 and RMSEvalues ranging from 0.0014 to 0.0174. This work highlights the effectiveness of requirements engineering in windpower forecasting, leading to enhanced forecast accuracy and grid stability, ultimately paving the way for moresustainable energy solutions.展开更多
This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols w...This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols with a well-balanced rhythm and suitable layout.The model,called Stacked Generalization for Channel Estimation(SGCE),aims to enhance channel estimation performance by eliminating pilot insertion and improving throughput.The SGCE model incorporates six machine learning methods:random forest(RF),gradient boosting machine(GB),light gradient boosting machine(LGBM),support vector regression(SVR),extremely randomized tree(ERT),and extreme gradient boosting(XGB).By generating meta-data from five models(RF,GB,LGBM,SVR,and ERT),we ensure accurate channel coefficient predictions using the XGB model.To validate themodeling performance,we employ the leave-one-out cross-validation(LOOCV)approach,where each observation serves as the validation set while the remaining observations act as the training set.SGCE performances’results demonstrate higher mean andmedian accuracy compared to the separatedmodel.SGCE achieves an average accuracy of 98.4%,precision of 98.1%,and the highest F1-score of 98.5%,accurately predicting channel coefficients.Furthermore,our proposedmethod outperforms prior traditional and intelligent techniques in terms of throughput and bit error rate.SGCE’s superior performance highlights its efficacy in optimizing channel estimation.It can effectively predict channel coefficients and contribute to enhancing the overall efficiency of radio mobile systems.Through extensive experimentation and evaluation,we demonstrate that SGCE improved performance in channel estimation,surpassing previous techniques.Accordingly,SGCE’s capabilities have significant implications for optimizing channel estimation in modern communication systems.展开更多
As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic...As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic power plants.This study proposes an integrated deep learning-based photovoltaic resource assessment method.Ensemble learning and deep learning methods are fused for photovoltaic resource assessment for the first time.The proposed method combines the random forest,gated recurrent unit,and long short-term memory to effectively improve the accuracy and reliability of photovoltaic resource assessment.The proposed method has strong adaptability and high accuracy even in the photovoltaic resource assessment of complex terrain and landscape.The experimental results show that the proposed method outperforms the comparison algorithm in all evaluation indexes,indicating that the proposed method has higher accuracy and reliability in photovoltaic resource assessment with improved generalization performance traditional single algorithm.展开更多
Medical image steganography aims to increase data security by concealing patient-personal information as well as diagnostic and therapeutic data in the spatial or frequency domain of radiological images.On the other h...Medical image steganography aims to increase data security by concealing patient-personal information as well as diagnostic and therapeutic data in the spatial or frequency domain of radiological images.On the other hand,the discipline of image steganalysis generally provides a classification based on whether an image has hidden data or not.Inspired by previous studies on image steganalysis,this study proposes a deep ensemble learning model for medical image steganalysis to detect malicious hidden data in medical images and develop medical image steganography methods aimed at securing personal information.With this purpose in mind,a dataset containing brain Magnetic Resonance(MR)images of healthy individuals and epileptic patients was built.Spatial Version of the Universal Wavelet Relative Distortion(S-UNIWARD),Highly Undetectable Stego(HUGO),and Minimizing the Power of Optimal Detector(MIPOD)techniques used in spatial image steganalysis were adapted to the problem,and various payloads of confidential data were hidden in medical images.The architectures of medical image steganalysis networks were transferred separately from eleven Dense Convolutional Network(DenseNet),Residual Neural Network(ResNet),and Inception-based models.The steganalysis outputs of these networks were determined by assembling models separately for each spatial embedding method with different payload ratios.The study demonstrated the success of pre-trained ResNet,DenseNet,and Inception models in the cover-stego mismatch scenario for each hiding technique with different payloads.Due to the high detection accuracy achieved,the proposed model has the potential to lead to the development of novel medical image steganography algorithms that existing deep learning-based steganalysis methods cannot detect.The experiments and the evaluations clearly proved this attempt.展开更多
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield base...The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.展开更多
By the emergence of the fourth industrial revolution,interconnected devices and sensors generate large-scale,dynamic,and inharmonious data in Industrial Internet of Things(IIoT)platforms.Such vast heterogeneous data i...By the emergence of the fourth industrial revolution,interconnected devices and sensors generate large-scale,dynamic,and inharmonious data in Industrial Internet of Things(IIoT)platforms.Such vast heterogeneous data increase the challenges of security risks and data analysis procedures.As IIoT grows,cyber-attacks become more diverse and complex,making existing anomaly detection models less effective to operate.In this paper,an ensemble deep learning model that uses the benefits of the Long Short-Term Memory(LSTM)and the AutoEncoder(AE)architecture to identify out-of-norm activities for cyber threat hunting in IIoT is proposed.In this model,the LSTM is applied to create a model on normal time series of data(past and present data)to learn normal data patterns and the important features of data are identified by AE to reduce data dimension.In addition,the imbalanced nature of IIoT datasets has not been considered in most of the previous literature,affecting low accuracy and performance.To solve this problem,the proposed model extracts new balanced data from the imbalanced datasets,and these new balanced data are fed into the deep LSTM AE anomaly detection model.In this paper,the proposed model is evaluated on two real IIoT datasets-Gas Pipeline(GP)and Secure Water Treatment(SWaT)that are imbalanced and consist of long-term and short-term dependency on data.The results are compared with conventional machine learning classifiers,Random Forest(RF),Multi-Layer Perceptron(MLP),Decision Tree(DT),and Super Vector Machines(SVM),in which higher performance in terms of accuracy is obtained,99.3%and 99.7%based on GP and SWaT datasets,respectively.Moreover,the proposed ensemble model is compared with advanced related models,including Stacked Auto-Encoders(SAE),Naive Bayes(NB),Projective Adaptive Resonance Theory(PART),Convolutional Auto-Encoder(C-AE),and Package Signatures(PS)based LSTM(PS-LSTM)model.展开更多
This paper presents a novel computerized technique for the segmentation of nuclei in hematoxylin and eosin(H&E)stained histopathology images.The purpose of this study is to overcome the challenges faced in automat...This paper presents a novel computerized technique for the segmentation of nuclei in hematoxylin and eosin(H&E)stained histopathology images.The purpose of this study is to overcome the challenges faced in automated nuclei segmentation due to the diversity of nuclei structures that arise from differences in tissue types and staining protocols,as well as the segmentation of variable-sized and overlapping nuclei.To this extent,the approach proposed in this study uses an ensemble of the UNet architecture with various Convolutional Neural Networks(CNN)architectures as encoder backbones,along with stain normalization and test time augmentation,to improve segmentation accuracy.Additionally,this paper employs a Structure-Preserving Color Normalization(SPCN)technique as a preprocessing step for stain normalization.The proposed model was trained and tested on both single-organ and multi-organ datasets,yielding an F1 score of 84.11%,mean Intersection over Union(IoU)of 81.67%,dice score of 84.11%,accuracy of 92.58%and precision of 83.78%on the multi-organ dataset,and an F1 score of 87.04%,mean IoU of 86.66%,dice score of 87.04%,accuracy of 96.69%and precision of 87.57%on the single-organ dataset.These findings demonstrate that the proposed model ensemble coupled with the right pre-processing and post-processing techniques enhances nuclei segmentation capabilities.展开更多
Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-drive...Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.展开更多
Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs label...Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.展开更多
As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single mac...As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.展开更多
Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confronta...Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confrontation training to achieve real-time and accurate prediction of target maneuver trajectory is an urgent problem to be solved.To solve this problem,in this paper,a hybrid algorithm based on transfer learning,online learning,ensemble learning,regularization technology,target maneuvering segmentation point recognition algorithm,and Volterra series,abbreviated as AERTrOS-Volterra is proposed.Firstly,the model makes full use of a large number of trajectory sample data generated by air combat confrontation training,and constructs a Tr-Volterra algorithm framework suitable for air combat target maneuver trajectory prediction,which realizes the extraction of effective information from the historical trajectory data.Secondly,in order to improve the real-time online prediction accuracy and robustness of the prediction model in complex electromagnetic environments,on the basis of the TrVolterra algorithm framework,a robust regularized online Sequential Volterra prediction model is proposed by integrating online learning method,regularization technology and inverse weighting calculation method based on the priori error.Finally,inspired by the preferable performance of models ensemble,ensemble learning scheme is also incorporated into our proposed algorithm,which adaptively updates the ensemble prediction model according to the performance of the model on real-time samples and the recognition results of target maneuvering segmentation points,including the adaptation of model weights;adaptation of parameters;and dynamic inclusion and removal of models.Compared with many existing time series prediction methods,the newly proposed target maneuver trajectory prediction algorithm can fully mine the prior knowledge contained in the historical data to assist the current prediction.The rationality and effectiveness of the proposed algorithm are verified by simulation on three sets of chaotic time series data sets and a set of real target maneuver trajectory data sets.展开更多
The emergence of deep fake videos in recent years has made image falsification a real danger.A person’s face and emotions are deep-faked in a video or speech and are substituted with a different face or voice employi...The emergence of deep fake videos in recent years has made image falsification a real danger.A person’s face and emotions are deep-faked in a video or speech and are substituted with a different face or voice employing deep learning to analyze speech or emotional content.Because of how clever these videos are frequently,Manipulation is challenging to spot.Social media are the most frequent and dangerous targets since they are weak outlets that are open to extortion or slander a human.In earlier times,it was not so easy to alter the videos,which required expertise in the domain and time.Nowadays,the generation of fake videos has become easier and with a high level of realism in the video.Deepfakes are forgeries and altered visual data that appear in still photos or video footage.Numerous automatic identification systems have been developed to solve this issue,however they are constrained to certain datasets and performpoorly when applied to different datasets.This study aims to develop an ensemble learning model utilizing a convolutional neural network(CNN)to handle deepfakes or Face2Face.We employed ensemble learning,a technique combining many classifiers to achieve higher prediction performance than a single classifier,boosting themodel’s accuracy.The performance of the generated model is evaluated on Face Forensics.This work is about building a new powerful model for automatically identifying deep fake videos with the DeepFake-Detection-Challenges(DFDC)dataset.We test our model using the DFDC,one of the most difficult datasets and get an accuracy of 96%.展开更多
The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of m...The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of malware by criminals also incorporates a number of sophisticated obfuscation techniques,making it difficult to classify and detect malware using conventional approaches.Therefore,this paper proposes a novel visualization-based malware classification system using transfer and ensemble learning(VMCTE).VMCTE has a strong anti-interference ability.Even if malware uses obfuscation,fuzzing,encryption,and other techniques to evade detection,it can be accurately classified into its corresponding malware family.Unlike traditional dynamic and static analysis techniques,VMCTE does not require either reverse engineering or the aid of domain expert knowledge.The proposed classification system combines three strong deep convolutional neural networks(ResNet50,MobilenetV1,and MobilenetV2)as feature extractors,lessens the dimension of the extracted features using principal component analysis,and employs a support vector machine to establish the classification model.The semantic representations of malware images can be extracted using various convolutional neural network(CNN)architectures,obtaining higher-quality features than traditional methods.Integrating fine-tuned and non-fine-tuned classification models based on transfer learning can greatly enhance the capacity to classify various families ofmalware.The experimental findings on the Malimg dataset demonstrate that VMCTE can attain 99.64%,99.64%,99.66%,and 99.64%accuracy,F1-score,precision,and recall,respectively.展开更多
Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classif...Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classification becomes a crucial topic which helps to categorize waste into hazardous or non-hazardous ones and thereby assist in the decision making of the waste management process.This study concentrates on the design of hazardous waste detection and classification using ensemble learning(HWDC-EL)technique to reduce toxicity and improve human health.The goal of the HWDC-EL technique is to detect the multiple classes of wastes,particularly hazardous and non-hazardous wastes.The HWDC-EL technique involves the ensemble of three feature extractors using Model Averaging technique namely discrete local binary patterns(DLBP),EfficientNet,and DenseNet121.In addition,the flower pollination algorithm(FPA)based hyperparameter optimizers are used to optimally adjust the parameters involved in the EfficientNet and DenseNet121 models.Moreover,a weighted voting-based ensemble classifier is derived using three machine learning algorithms namely support vector machine(SVM),extreme learning machine(ELM),and gradient boosting tree(GBT).The performance of the HWDC-EL technique is tested using a benchmark Garbage dataset and it obtains a maximum accuracy of 98.85%.展开更多
This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classi...This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classification or prediction by the RF classifier.The preprocessed data is normalized using minmax normalization often used before modelfitting.As the input data or variables are measured at different scales,it is necessary to normalize them to contribute equally to the modelfitting.Then,the RF classifier is employed for course selection which is an ensemble learning method and k-fold cross-validation(k=10)is used to validate the model.The proposed Prediction Model for Course Selection(PMCS)system is considered a multi-class problem that predicts the course for a particular learner with three complexity levels,namely low,medium and high.It is operated under two modes;locally and globally.The former considers the gender of the learner and the later does not consider the gender of the learner.The database comprises the learner opinions from 75 males and 75 females per category(low,medium and high).Thus the system uses a total of 450 samples to evaluate the performance of the PMCS system.Results show that the system’s performance,while using locally i.e.,gender-wise has slightly higher performance than the global system.The RF classifier with 75 decision trees in the global system provides an average accuracy of 97.6%,whereas in the local system it is 97%(male)and 97.6%(female).The overall performance of the RF classifier with 75 trees is better than 25,50 and 100 decision trees in both local and global systems.展开更多
Nowadays,quantum machine learning is attracting great interest in a wide range offields due to its potential superior performance and capabilities.The massive increase in computational capacity and speed of quantum com...Nowadays,quantum machine learning is attracting great interest in a wide range offields due to its potential superior performance and capabilities.The massive increase in computational capacity and speed of quantum computers can lead to a quantum leap in the healthcarefield.Heart disease seriously threa-tens human health since it is the leading cause of death worldwide.Quantum machine learning methods can propose effective solutions to predict heart disease and aid in early diagnosis.In this study,an ensemble machine learning model based on quantum machine learning classifiers is proposed to predict the risk of heart disease.The proposed model is a bagging ensemble learning model where a quantum support vector classifier was used as a base classifier.Further-more,in order to make the model’s outcomes more explainable,the importance of every single feature in the prediction is computed and visualized using SHapley Additive exPlanations(SHAP)framework.In the experimental study,other stand-alone quantum classifiers,namely,Quantum Support Vector Classifier(QSVC),Quantum Neural Network(QNN),and Variational Quantum Classifier(VQC)are applied and compared with classical machine learning classifiers such as Sup-port Vector Machine(SVM),and Artificial Neural Network(ANN).The experi-mental results on the Cleveland dataset reveal the superiority of QSVC compared to the others,which explains its use in the proposed bagging model.The Bagging-QSVC model outperforms all aforementioned classifiers with an accuracy of 90.16%while showing great competitiveness compared to some state-of-the-art models using the same dataset.The results of the study indicate that quantum machine learning classifiers perform better than classical machine learning classi-fiers in predicting heart disease.In addition,the study reveals that the bagging ensemble learning technique is effective in improving the prediction accuracy of quantum classifiers.展开更多
The Internet of Things(IoT)system has confronted dramatic growth in high dimensionality and data traffic.The system named intrusion detection systems(IDS)is broadly utilized for the enhancement of security posture in ...The Internet of Things(IoT)system has confronted dramatic growth in high dimensionality and data traffic.The system named intrusion detection systems(IDS)is broadly utilized for the enhancement of security posture in an IT infrastructure.An IDS is a practical and suitable method for assuring network security and identifying attacks by protecting it from intrusive hackers.Nowadays,machine learning(ML)-related techniques were used for detecting intrusion in IoTs IDSs.But,the IoT IDS mechanism faces significant challenges because of physical and functional diversity.Such IoT features use every attribute and feature for IDS self-protection unrealistic and difficult.This study develops a Modified Metaheuristics with Weighted Majority Voting Ensemble Deep Learning(MM-WMVEDL)model for IDS.The proposed MM-WMVEDL technique aims to discriminate distinct kinds of attacks in the IoT environment.To attain this,the presented MM-WMVEDL technique implements min-max normalization to scale the input dataset.For feature selection purposes,the MM-WMVEDL technique exploits the Harris hawk optimization-based elite fractional derivative mutation(HHO-EFDM)technique.In the presented MM-WMVEDL technique,a Bi-directional long short-term memory(BiLSTM),extreme learning machine(ELM)and an ensemble of gated recurrent unit(GRU)models take place.A wide range of simulation analyses was performed on CICIDS-2017 dataset to exhibit the promising performance of the MM-WMVEDL technique.The comparison study pointed out the supremacy of the MM-WMVEDL method over other recent methods with accuracy of 99.67%.展开更多
基金This work is supported by EIAS(Emerging Intelligent Autonomous Systems)Data Science Lab,Prince Sultan University,Kingdom of Saudi Arabia,by paying the APC.
文摘The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.
文摘The Indian Himalayan region is frequently experiencing climate change-induced landslides.Thus,landslide susceptibility assessment assumes greater significance for lessening the impact of a landslide hazard.This paper makes an attempt to assess landslide susceptibility in Shimla district of the northwest Indian Himalayan region.It examined the effectiveness of random forest(RF),multilayer perceptron(MLP),sequential minimal optimization regression(SMOreg)and bagging ensemble(B-RF,BSMOreg,B-MLP)models.A landslide inventory map comprising 1052 locations of past landslide occurrences was classified into training(70%)and testing(30%)datasets.The site-specific influencing factors were selected by employing a multicollinearity test.The relationship between past landslide occurrences and influencing factors was established using the frequency ratio method.The effectiveness of machine learning models was verified through performance assessors.The landslide susceptibility maps were validated by the area under the receiver operating characteristic curves(ROC-AUC),accuracy,precision,recall and F1-score.The key performance metrics and map validation demonstrated that the BRF model(correlation coefficient:0.988,mean absolute error:0.010,root mean square error:0.058,relative absolute error:2.964,ROC-AUC:0.947,accuracy:0.778,precision:0.819,recall:0.917 and F-1 score:0.865)outperformed the single classifiers and other bagging ensemble models for landslide susceptibility.The results show that the largest area was found under the very high susceptibility zone(33.87%),followed by the low(27.30%),high(20.68%)and moderate(18.16%)susceptibility zones.The factors,namely average annual rainfall,slope,lithology,soil texture and earthquake magnitude have been identified as the influencing factors for very high landslide susceptibility.Soil texture,lineament density and elevation have been attributed to high and moderate susceptibility.Thus,the study calls for devising suitable landslide mitigation measures in the study area.Structural measures,an immediate response system,community participation and coordination among stakeholders may help lessen the detrimental impact of landslides.The findings from this study could aid decision-makers in mitigating future catastrophes and devising suitable strategies in other geographical regions with similar geological characteristics.
基金the Deanship of Scientific Research,Najran University,Kingdom of Saudi Arabia,for funding this work under the Research Groups Funding Program Grant Code Number(NU/RG/SERC/12/43).
文摘Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and everpresent threat is Ransomware-as-a-Service(RaaS)assaults,which enable even individuals with minimal technical knowledge to conduct ransomware operations.This study provides a new approach for RaaS attack detection which uses an ensemble of deep learning models.For this purpose,the network intrusion detection dataset“UNSWNB15”from the Intelligent Security Group of the University of New South Wales,Australia is analyzed.In the initial phase,the rectified linear unit-,scaled exponential linear unit-,and exponential linear unit-based three separate Multi-Layer Perceptron(MLP)models are developed.Later,using the combined predictive power of these three MLPs,the RansoDetect Fusion ensemble model is introduced in the suggested methodology.The proposed ensemble technique outperforms previous studieswith impressive performance metrics results,including 98.79%accuracy and recall,98.85%precision,and 98.80%F1-score.The empirical results of this study validate the ensemble model’s ability to improve cybersecurity defenses by showing that it outperforms individual MLPmodels.In expanding the field of cybersecurity strategy,this research highlights the significance of combined deep learning models in strengthening intrusion detection systems against sophisticated cyber threats.
文摘Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article presentsa novel approach for hybrid ensemble learning that is based on rigorous requirements engineering concepts.The approach finds significant parameters influencing forecasting accuracy by evaluating real-time Modern-EraRetrospective Analysis for Research and Applications (MERRA2) data from several European Wind farms usingin-depth stakeholder research and requirements elicitation. Ensemble learning is used to develop a robust model,while a temporal convolutional network handles time-series complexities and data gaps. The ensemble-temporalneural network is enhanced by providing different input parameters including training layers, hidden and dropoutlayers along with activation and loss functions. The proposed framework is further analyzed by comparing stateof-the-art forecasting models in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE),respectively. The energy efficiency performance indicators showed that the proposed model demonstrates errorreduction percentages of approximately 16.67%, 28.57%, and 81.92% for MAE, and 38.46%, 17.65%, and 90.78%for RMSE for MERRAWind farms 1, 2, and 3, respectively, compared to other existingmethods. These quantitativeresults show the effectiveness of our proposed model with MAE values ranging from 0.0010 to 0.0156 and RMSEvalues ranging from 0.0014 to 0.0174. This work highlights the effectiveness of requirements engineering in windpower forecasting, leading to enhanced forecast accuracy and grid stability, ultimately paving the way for moresustainable energy solutions.
基金This research project was funded by the Deanship of Scientific Research,Princess Nourah bint Abdulrahman University,through the Program of Research Project Funding After Publication,grant No(43-PRFA-P-58).
文摘This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols with a well-balanced rhythm and suitable layout.The model,called Stacked Generalization for Channel Estimation(SGCE),aims to enhance channel estimation performance by eliminating pilot insertion and improving throughput.The SGCE model incorporates six machine learning methods:random forest(RF),gradient boosting machine(GB),light gradient boosting machine(LGBM),support vector regression(SVR),extremely randomized tree(ERT),and extreme gradient boosting(XGB).By generating meta-data from five models(RF,GB,LGBM,SVR,and ERT),we ensure accurate channel coefficient predictions using the XGB model.To validate themodeling performance,we employ the leave-one-out cross-validation(LOOCV)approach,where each observation serves as the validation set while the remaining observations act as the training set.SGCE performances’results demonstrate higher mean andmedian accuracy compared to the separatedmodel.SGCE achieves an average accuracy of 98.4%,precision of 98.1%,and the highest F1-score of 98.5%,accurately predicting channel coefficients.Furthermore,our proposedmethod outperforms prior traditional and intelligent techniques in terms of throughput and bit error rate.SGCE’s superior performance highlights its efficacy in optimizing channel estimation.It can effectively predict channel coefficients and contribute to enhancing the overall efficiency of radio mobile systems.Through extensive experimentation and evaluation,we demonstrate that SGCE improved performance in channel estimation,surpassing previous techniques.Accordingly,SGCE’s capabilities have significant implications for optimizing channel estimation in modern communication systems.
基金funded by Key-Area Research and Development Program Project of Guangdong Province (2021B0101230003)China Southern Power Grid Science and Technology Project (ZBKJXM20220004).
文摘As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic power plants.This study proposes an integrated deep learning-based photovoltaic resource assessment method.Ensemble learning and deep learning methods are fused for photovoltaic resource assessment for the first time.The proposed method combines the random forest,gated recurrent unit,and long short-term memory to effectively improve the accuracy and reliability of photovoltaic resource assessment.The proposed method has strong adaptability and high accuracy even in the photovoltaic resource assessment of complex terrain and landscape.The experimental results show that the proposed method outperforms the comparison algorithm in all evaluation indexes,indicating that the proposed method has higher accuracy and reliability in photovoltaic resource assessment with improved generalization performance traditional single algorithm.
文摘Medical image steganography aims to increase data security by concealing patient-personal information as well as diagnostic and therapeutic data in the spatial or frequency domain of radiological images.On the other hand,the discipline of image steganalysis generally provides a classification based on whether an image has hidden data or not.Inspired by previous studies on image steganalysis,this study proposes a deep ensemble learning model for medical image steganalysis to detect malicious hidden data in medical images and develop medical image steganography methods aimed at securing personal information.With this purpose in mind,a dataset containing brain Magnetic Resonance(MR)images of healthy individuals and epileptic patients was built.Spatial Version of the Universal Wavelet Relative Distortion(S-UNIWARD),Highly Undetectable Stego(HUGO),and Minimizing the Power of Optimal Detector(MIPOD)techniques used in spatial image steganalysis were adapted to the problem,and various payloads of confidential data were hidden in medical images.The architectures of medical image steganalysis networks were transferred separately from eleven Dense Convolutional Network(DenseNet),Residual Neural Network(ResNet),and Inception-based models.The steganalysis outputs of these networks were determined by assembling models separately for each spatial embedding method with different payload ratios.The study demonstrated the success of pre-trained ResNet,DenseNet,and Inception models in the cover-stego mismatch scenario for each hiding technique with different payloads.Due to the high detection accuracy achieved,the proposed model has the potential to lead to the development of novel medical image steganography algorithms that existing deep learning-based steganalysis methods cannot detect.The experiments and the evaluations clearly proved this attempt.
基金supported by the Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII)。
文摘The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.
文摘By the emergence of the fourth industrial revolution,interconnected devices and sensors generate large-scale,dynamic,and inharmonious data in Industrial Internet of Things(IIoT)platforms.Such vast heterogeneous data increase the challenges of security risks and data analysis procedures.As IIoT grows,cyber-attacks become more diverse and complex,making existing anomaly detection models less effective to operate.In this paper,an ensemble deep learning model that uses the benefits of the Long Short-Term Memory(LSTM)and the AutoEncoder(AE)architecture to identify out-of-norm activities for cyber threat hunting in IIoT is proposed.In this model,the LSTM is applied to create a model on normal time series of data(past and present data)to learn normal data patterns and the important features of data are identified by AE to reduce data dimension.In addition,the imbalanced nature of IIoT datasets has not been considered in most of the previous literature,affecting low accuracy and performance.To solve this problem,the proposed model extracts new balanced data from the imbalanced datasets,and these new balanced data are fed into the deep LSTM AE anomaly detection model.In this paper,the proposed model is evaluated on two real IIoT datasets-Gas Pipeline(GP)and Secure Water Treatment(SWaT)that are imbalanced and consist of long-term and short-term dependency on data.The results are compared with conventional machine learning classifiers,Random Forest(RF),Multi-Layer Perceptron(MLP),Decision Tree(DT),and Super Vector Machines(SVM),in which higher performance in terms of accuracy is obtained,99.3%and 99.7%based on GP and SWaT datasets,respectively.Moreover,the proposed ensemble model is compared with advanced related models,including Stacked Auto-Encoders(SAE),Naive Bayes(NB),Projective Adaptive Resonance Theory(PART),Convolutional Auto-Encoder(C-AE),and Package Signatures(PS)based LSTM(PS-LSTM)model.
文摘This paper presents a novel computerized technique for the segmentation of nuclei in hematoxylin and eosin(H&E)stained histopathology images.The purpose of this study is to overcome the challenges faced in automated nuclei segmentation due to the diversity of nuclei structures that arise from differences in tissue types and staining protocols,as well as the segmentation of variable-sized and overlapping nuclei.To this extent,the approach proposed in this study uses an ensemble of the UNet architecture with various Convolutional Neural Networks(CNN)architectures as encoder backbones,along with stain normalization and test time augmentation,to improve segmentation accuracy.Additionally,this paper employs a Structure-Preserving Color Normalization(SPCN)technique as a preprocessing step for stain normalization.The proposed model was trained and tested on both single-organ and multi-organ datasets,yielding an F1 score of 84.11%,mean Intersection over Union(IoU)of 81.67%,dice score of 84.11%,accuracy of 92.58%and precision of 83.78%on the multi-organ dataset,and an F1 score of 87.04%,mean IoU of 86.66%,dice score of 87.04%,accuracy of 96.69%and precision of 87.57%on the single-organ dataset.These findings demonstrate that the proposed model ensemble coupled with the right pre-processing and post-processing techniques enhances nuclei segmentation capabilities.
基金This work was financially supported by National Natural Science Foundation of China(41972262)Hebei Natural Science Foundation for Excellent Young Scholars(D2020504032)+1 种基金Central Plains Science and technology innovation leader Project(214200510030)Key research and development Project of Henan province(221111321500).
文摘Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.
基金financially supported by the National Natural Science Foundation of China(Grant No.42002134)China Postdoctoral Science Foundation(Grant No.2021T140735)Science Foundation of China University of Petroleum,Beijing(Grant Nos.2462020XKJS02 and 2462020YXZZ004).
文摘Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.
文摘As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.
基金the support of the Fundamental Research Funds for the Air Force Engineering University under Grant No.XZJK2019040。
文摘Target maneuver trajectory prediction is an important prerequisite for air combat situation awareness and maneuver decision-making.However,how to use a large amount of trajectory data generated by air combat confrontation training to achieve real-time and accurate prediction of target maneuver trajectory is an urgent problem to be solved.To solve this problem,in this paper,a hybrid algorithm based on transfer learning,online learning,ensemble learning,regularization technology,target maneuvering segmentation point recognition algorithm,and Volterra series,abbreviated as AERTrOS-Volterra is proposed.Firstly,the model makes full use of a large number of trajectory sample data generated by air combat confrontation training,and constructs a Tr-Volterra algorithm framework suitable for air combat target maneuver trajectory prediction,which realizes the extraction of effective information from the historical trajectory data.Secondly,in order to improve the real-time online prediction accuracy and robustness of the prediction model in complex electromagnetic environments,on the basis of the TrVolterra algorithm framework,a robust regularized online Sequential Volterra prediction model is proposed by integrating online learning method,regularization technology and inverse weighting calculation method based on the priori error.Finally,inspired by the preferable performance of models ensemble,ensemble learning scheme is also incorporated into our proposed algorithm,which adaptively updates the ensemble prediction model according to the performance of the model on real-time samples and the recognition results of target maneuvering segmentation points,including the adaptation of model weights;adaptation of parameters;and dynamic inclusion and removal of models.Compared with many existing time series prediction methods,the newly proposed target maneuver trajectory prediction algorithm can fully mine the prior knowledge contained in the historical data to assist the current prediction.The rationality and effectiveness of the proposed algorithm are verified by simulation on three sets of chaotic time series data sets and a set of real target maneuver trajectory data sets.
文摘The emergence of deep fake videos in recent years has made image falsification a real danger.A person’s face and emotions are deep-faked in a video or speech and are substituted with a different face or voice employing deep learning to analyze speech or emotional content.Because of how clever these videos are frequently,Manipulation is challenging to spot.Social media are the most frequent and dangerous targets since they are weak outlets that are open to extortion or slander a human.In earlier times,it was not so easy to alter the videos,which required expertise in the domain and time.Nowadays,the generation of fake videos has become easier and with a high level of realism in the video.Deepfakes are forgeries and altered visual data that appear in still photos or video footage.Numerous automatic identification systems have been developed to solve this issue,however they are constrained to certain datasets and performpoorly when applied to different datasets.This study aims to develop an ensemble learning model utilizing a convolutional neural network(CNN)to handle deepfakes or Face2Face.We employed ensemble learning,a technique combining many classifiers to achieve higher prediction performance than a single classifier,boosting themodel’s accuracy.The performance of the generated model is evaluated on Face Forensics.This work is about building a new powerful model for automatically identifying deep fake videos with the DeepFake-Detection-Challenges(DFDC)dataset.We test our model using the DFDC,one of the most difficult datasets and get an accuracy of 96%.
基金This work is supported,in part,by the National Natural Science Foundation of China Grant No.62102190 and 62272236in part,by the Natural Science Foundation of Jiangsu Province under Grant No.BK20201136 and BK20191401.
文摘The Corona Virus Disease 2019(COVID-19)effect has made telecommuting and remote learning the norm.The growing number of Internet-connected devices provides cyber attackers with more attack vectors.The development of malware by criminals also incorporates a number of sophisticated obfuscation techniques,making it difficult to classify and detect malware using conventional approaches.Therefore,this paper proposes a novel visualization-based malware classification system using transfer and ensemble learning(VMCTE).VMCTE has a strong anti-interference ability.Even if malware uses obfuscation,fuzzing,encryption,and other techniques to evade detection,it can be accurately classified into its corresponding malware family.Unlike traditional dynamic and static analysis techniques,VMCTE does not require either reverse engineering or the aid of domain expert knowledge.The proposed classification system combines three strong deep convolutional neural networks(ResNet50,MobilenetV1,and MobilenetV2)as feature extractors,lessens the dimension of the extracted features using principal component analysis,and employs a support vector machine to establish the classification model.The semantic representations of malware images can be extracted using various convolutional neural network(CNN)architectures,obtaining higher-quality features than traditional methods.Integrating fine-tuned and non-fine-tuned classification models based on transfer learning can greatly enhance the capacity to classify various families ofmalware.The experimental findings on the Malimg dataset demonstrate that VMCTE can attain 99.64%,99.64%,99.66%,and 99.64%accuracy,F1-score,precision,and recall,respectively.
基金the Deanship of Scientific Research at King Khalid University for funding this work underGrant Number(RGP 2/209/42)PrincessNourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R136)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4210118DSR27).
文摘Proper waste management models using recent technologies like computer vision,machine learning(ML),and deep learning(DL)are needed to effectively handle the massive quantity of increasing waste.Therefore,waste classification becomes a crucial topic which helps to categorize waste into hazardous or non-hazardous ones and thereby assist in the decision making of the waste management process.This study concentrates on the design of hazardous waste detection and classification using ensemble learning(HWDC-EL)technique to reduce toxicity and improve human health.The goal of the HWDC-EL technique is to detect the multiple classes of wastes,particularly hazardous and non-hazardous wastes.The HWDC-EL technique involves the ensemble of three feature extractors using Model Averaging technique namely discrete local binary patterns(DLBP),EfficientNet,and DenseNet121.In addition,the flower pollination algorithm(FPA)based hyperparameter optimizers are used to optimally adjust the parameters involved in the EfficientNet and DenseNet121 models.Moreover,a weighted voting-based ensemble classifier is derived using three machine learning algorithms namely support vector machine(SVM),extreme learning machine(ELM),and gradient boosting tree(GBT).The performance of the HWDC-EL technique is tested using a benchmark Garbage dataset and it obtains a maximum accuracy of 98.85%.
文摘This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classification or prediction by the RF classifier.The preprocessed data is normalized using minmax normalization often used before modelfitting.As the input data or variables are measured at different scales,it is necessary to normalize them to contribute equally to the modelfitting.Then,the RF classifier is employed for course selection which is an ensemble learning method and k-fold cross-validation(k=10)is used to validate the model.The proposed Prediction Model for Course Selection(PMCS)system is considered a multi-class problem that predicts the course for a particular learner with three complexity levels,namely low,medium and high.It is operated under two modes;locally and globally.The former considers the gender of the learner and the later does not consider the gender of the learner.The database comprises the learner opinions from 75 males and 75 females per category(low,medium and high).Thus the system uses a total of 450 samples to evaluate the performance of the PMCS system.Results show that the system’s performance,while using locally i.e.,gender-wise has slightly higher performance than the global system.The RF classifier with 75 decision trees in the global system provides an average accuracy of 97.6%,whereas in the local system it is 97%(male)and 97.6%(female).The overall performance of the RF classifier with 75 trees is better than 25,50 and 100 decision trees in both local and global systems.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R196),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Nowadays,quantum machine learning is attracting great interest in a wide range offields due to its potential superior performance and capabilities.The massive increase in computational capacity and speed of quantum computers can lead to a quantum leap in the healthcarefield.Heart disease seriously threa-tens human health since it is the leading cause of death worldwide.Quantum machine learning methods can propose effective solutions to predict heart disease and aid in early diagnosis.In this study,an ensemble machine learning model based on quantum machine learning classifiers is proposed to predict the risk of heart disease.The proposed model is a bagging ensemble learning model where a quantum support vector classifier was used as a base classifier.Further-more,in order to make the model’s outcomes more explainable,the importance of every single feature in the prediction is computed and visualized using SHapley Additive exPlanations(SHAP)framework.In the experimental study,other stand-alone quantum classifiers,namely,Quantum Support Vector Classifier(QSVC),Quantum Neural Network(QNN),and Variational Quantum Classifier(VQC)are applied and compared with classical machine learning classifiers such as Sup-port Vector Machine(SVM),and Artificial Neural Network(ANN).The experi-mental results on the Cleveland dataset reveal the superiority of QSVC compared to the others,which explains its use in the proposed bagging model.The Bagging-QSVC model outperforms all aforementioned classifiers with an accuracy of 90.16%while showing great competitiveness compared to some state-of-the-art models using the same dataset.The results of the study indicate that quantum machine learning classifiers perform better than classical machine learning classi-fiers in predicting heart disease.In addition,the study reveals that the bagging ensemble learning technique is effective in improving the prediction accuracy of quantum classifiers.
基金funded by Institutional Fund Projects under Grant No.(IFPIP:667-612-1443).
文摘The Internet of Things(IoT)system has confronted dramatic growth in high dimensionality and data traffic.The system named intrusion detection systems(IDS)is broadly utilized for the enhancement of security posture in an IT infrastructure.An IDS is a practical and suitable method for assuring network security and identifying attacks by protecting it from intrusive hackers.Nowadays,machine learning(ML)-related techniques were used for detecting intrusion in IoTs IDSs.But,the IoT IDS mechanism faces significant challenges because of physical and functional diversity.Such IoT features use every attribute and feature for IDS self-protection unrealistic and difficult.This study develops a Modified Metaheuristics with Weighted Majority Voting Ensemble Deep Learning(MM-WMVEDL)model for IDS.The proposed MM-WMVEDL technique aims to discriminate distinct kinds of attacks in the IoT environment.To attain this,the presented MM-WMVEDL technique implements min-max normalization to scale the input dataset.For feature selection purposes,the MM-WMVEDL technique exploits the Harris hawk optimization-based elite fractional derivative mutation(HHO-EFDM)technique.In the presented MM-WMVEDL technique,a Bi-directional long short-term memory(BiLSTM),extreme learning machine(ELM)and an ensemble of gated recurrent unit(GRU)models take place.A wide range of simulation analyses was performed on CICIDS-2017 dataset to exhibit the promising performance of the MM-WMVEDL technique.The comparison study pointed out the supremacy of the MM-WMVEDL method over other recent methods with accuracy of 99.67%.