Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantil...Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantile regression(QR)is highly competitive in terms of both flexibility and predictive performance.Nevertheless,a long-standing problem of QR is quantile crossing,which greatly limits the interpretability of QR-calibrated forecasts.On this point,this study proposes a non-crossing quantile regression neural network(NCQRNN),for calibrating ensemble NWP forecasts into a set of reliable quantile forecasts without crossing.The overarching design principle of NCQRNN is to add on top of the conventional QRNN structure another hidden layer,which imposes a non-decreasing mapping between the combined output from nodes of the last hidden layer to the nodes of the output layer,through a triangular weight matrix with positive entries.The empirical part of the work considers a solar irradiance case study,in which four years of ensemble irradiance forecasts at seven locations,issued by the European Centre for Medium-Range Weather Forecasts,are calibrated via NCQRNN,as well as via an eclectic mix of benchmarking models,ranging from the naïve climatology to the state-of-the-art deep-learning and other non-crossing models.Formal and stringent forecast verification suggests that the forecasts post-processed via NCQRNN attain the maximum sharpness subject to calibration,amongst all competitors.Furthermore,the proposed conception to resolve quantile crossing is remarkably simple yet general,and thus has broad applicability as it can be integrated with many shallow-and deep-learning-based neural networks.展开更多
Potato cyst nematodes(PCNs)are a significant threat to potato production,having caused substantial damage in many countries.Predicting the future distribution of PCN species is crucial to implementing effective biosec...Potato cyst nematodes(PCNs)are a significant threat to potato production,having caused substantial damage in many countries.Predicting the future distribution of PCN species is crucial to implementing effective biosecurity strategies,especially given the impact of climate change on pest species invasion and distribution.Machine learning(ML),specifically ensemble models,has emerged as a powerful tool in predicting species distributions due to its ability to learn and make predictions based on complex data sets.Thus,this research utilised advanced machine learning techniques to predict the distribution of PCN species under climate change conditions,providing the initial element for invasion risk assessment.We first used Global Climate Models to generate homogeneous climate predictors to mitigate the variation among predictors.Then,five machine learning models were employed to build two groups of ensembles,single-algorithm ensembles(ESA)and multi-algorithm ensembles(EMA),and compared their performances.In this research,the EMA did not always perform better than the ESA,and the ESA of Artificial Neural Network gave the highest performance while being cost-effective.Prediction results indicated that the distribution range of PCNs would shift northward with a decrease in tropical zones and an increase in northern latitudes.However,the total area of suitable regions will not change significantly,occupying 16-20%of the total land surface(18%under current conditions).This research alerts policymakers and practitioners to the risk of PCNs’incursion into new regions.Additionally,this ML process offers the capability to track changes in the distribution of various species and provides scientifically grounded evidence for formulating long-term biosecurity plans for their control.展开更多
Dear Editor,This letter presents a novel process monitoring model based on ensemble structure analysis(ESA).The ESA model takes advantage of principal component analysis(PCA),locality preserving projections(LPP),and m...Dear Editor,This letter presents a novel process monitoring model based on ensemble structure analysis(ESA).The ESA model takes advantage of principal component analysis(PCA),locality preserving projections(LPP),and multi-manifold projections(MMP)models,and then combines the multiple solutions within an ensemble result through Bayesian inference.In the developed ESA model,different structure features of the given dataset are taken into account simultaneously,the suitability and reliability of the ESA-based monitoring model are then illustrated through comparison.Introduction:The requirement for ensuring safe operation and improving process efficiency has led to increased research activity in the field of process monitoring.展开更多
Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective dia...Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques.展开更多
We present a formulation of the single-trajectory entropy using the trajectories ensemble. The single-trajectory entropy is affected by its surrounding trajectories via the distribution function. The single-trajectory...We present a formulation of the single-trajectory entropy using the trajectories ensemble. The single-trajectory entropy is affected by its surrounding trajectories via the distribution function. The single-trajectory entropies are studied in two typical potentials, i.e., harmonic potential and double-well potential, and in viscous environment by interacting trajectory method. The results of the trajectory methods are in agreement well with the numerical methods(Monte Carlo simulation and difference equation). The single-trajectory entropies increasing(decreasing) could be caused by absorption(emission) heat from(to) the thermal environment. Also, some interesting trajectories, which correspond to the rare evens in the processes, are demonstrated.展开更多
A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble...A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.展开更多
Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and ...Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and everpresent threat is Ransomware-as-a-Service(RaaS)assaults,which enable even individuals with minimal technical knowledge to conduct ransomware operations.This study provides a new approach for RaaS attack detection which uses an ensemble of deep learning models.For this purpose,the network intrusion detection dataset“UNSWNB15”from the Intelligent Security Group of the University of New South Wales,Australia is analyzed.In the initial phase,the rectified linear unit-,scaled exponential linear unit-,and exponential linear unit-based three separate Multi-Layer Perceptron(MLP)models are developed.Later,using the combined predictive power of these three MLPs,the RansoDetect Fusion ensemble model is introduced in the suggested methodology.The proposed ensemble technique outperforms previous studieswith impressive performance metrics results,including 98.79%accuracy and recall,98.85%precision,and 98.80%F1-score.The empirical results of this study validate the ensemble model’s ability to improve cybersecurity defenses by showing that it outperforms individual MLPmodels.In expanding the field of cybersecurity strategy,this research highlights the significance of combined deep learning models in strengthening intrusion detection systems against sophisticated cyber threats.展开更多
This research introduces an innovative ensemble approach,combining Deep Residual Networks(ResNets)and Bidirectional Gated Recurrent Units(BiGRU),augmented with an Attention Mechanism,for the classification of heart ar...This research introduces an innovative ensemble approach,combining Deep Residual Networks(ResNets)and Bidirectional Gated Recurrent Units(BiGRU),augmented with an Attention Mechanism,for the classification of heart arrhythmias.The escalating prevalence of cardiovascular diseases necessitates advanced diagnostic tools to enhance accuracy and efficiency.The model leverages the deep hierarchical feature extraction capabilities of ResNets,which are adept at identifying intricate patterns within electrocardiogram(ECG)data,while BiGRU layers capture the temporal dynamics essential for understanding the sequential nature of ECG signals.The integration of an Attention Mechanism refines the model’s focus on critical segments of ECG data,ensuring a nuanced analysis that highlights the most informative features for arrhythmia classification.Evaluated on a comprehensive dataset of 12-lead ECG recordings,our ensemble model demonstrates superior performance in distinguishing between various types of arrhythmias,with an accuracy of 98.4%,a precision of 98.1%,a recall of 98%,and an F-score of 98%.This novel combination of convolutional and recurrent neural networks,supplemented by attention-driven mechanisms,advances automated ECG analysis,contributing significantly to healthcare’s machine learning applications and presenting a step forward in developing non-invasive,efficient,and reliable tools for early diagnosis and management of heart diseases.展开更多
With the widespread use of machine learning(ML)technology,the operational efficiency and responsiveness of power grids have been significantly enhanced,allowing smart grids to achieve high levels of automation and int...With the widespread use of machine learning(ML)technology,the operational efficiency and responsiveness of power grids have been significantly enhanced,allowing smart grids to achieve high levels of automation and intelligence.However,tree ensemble models commonly used in smart grids are vulnerable to adversarial attacks,making it urgent to enhance their robustness.To address this,we propose a robustness enhancement method that incorporates physical constraints into the node-splitting decisions of tree ensembles.Our algorithm improves robustness by developing a dataset of adversarial examples that comply with physical laws,ensuring training data accurately reflects possible attack scenarios while adhering to physical rules.In our experiments,the proposed method increased robustness against adversarial attacks by 100%when applied to real grid data under physical constraints.These results highlight the advantages of our method in maintaining efficient and secure operation of smart grids under adversarial conditions.展开更多
Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial pertur...Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial perturbation method tends only to capture synoptic scale initial uncertainty rather than mesoscale uncertainty in global ensemble prediction. To address this issue, a multiscale SV initial perturbation method based on the China Meteorological Administration Global Ensemble Prediction System(CMA-GEPS) is proposed to quantify multiscale initial uncertainty. The multiscale SV initial perturbation approach entails calculating multiscale SVs at different resolutions with multiple linearized physical processes to capture fast-growing perturbations from mesoscale to synoptic scale in target areas and combining these SVs by using a Gaussian sampling method with amplitude coefficients to generate initial perturbations. Following that, the energy norm,energy spectrum, and structure of multiscale SVs and their impact on GEPS are analyzed based on a batch experiment in different seasons. The results show that the multiscale SV initial perturbations can possess more energy and capture more mesoscale uncertainties than the traditional single-SV method. Meanwhile, multiscale SV initial perturbations can reflect the strongest dynamical instability in target areas. Their performances in global ensemble prediction when compared to single-scale SVs are shown to(i) improve the relationship between the ensemble spread and the root-mean-square error and(ii) provide a better probability forecast skill for atmospheric circulation during the late forecast period and for short-to medium-range precipitation. This study provides scientific evidence and application foundations for the design and development of a multiscale SV initial perturbation method for the GEPS.展开更多
The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human re...The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.展开更多
Geotechnical engineering data are usually small-sample and high-dimensional,which brings a lot of challenges in predictive modeling.This paper uses a typical high-dimensional and small-sample swell pressure(P_(s))data...Geotechnical engineering data are usually small-sample and high-dimensional,which brings a lot of challenges in predictive modeling.This paper uses a typical high-dimensional and small-sample swell pressure(P_(s))dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction.Based on six machine learning(ML)algorithms,the base learner pool is constructed,and four ensemble methods,Stacking(SG),Blending(BG),Voting regression(VR),and Feature weight linear stacking(FWL),are used for the multi-algorithm ensemble.Furthermore,the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling.The results show that the proposed methods are superior to traditional prediction models and base ML models,where FWL is more suitable for modeling with small-sample datasets,and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect,which points the way to feature selection for predictive modeling.Based on the ensemble methods,the feature importance of the five primary factors affecting P_(s) is the maximum dry density(31.145%),clay fraction(15.876%),swell percent(15.289%),plasticity index(14%),and optimum moisture content(13.69%),the influence of input parameters on P_(s) is also investigated,in line with the findings of the existing literature.展开更多
Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these d...Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care.This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques.Sequential forward feature selection,sequential backward feature elimination,and bidirectional feature elimination are investigated in this study.In ensemble learning,random forest,adaptive boosting,and bagging classifiers are employed.The effectiveness of these techniques is evaluated using two different datasets obtained from the University of California Irvine-Machine Learning Repository,both of which undergo preprocessing steps,including outlier removal,addressing missing data,data cleansing,and feature reduction.Extensive experimentation demonstrates the remarkable success of proposed ensemble stacking and bidirectional feature elimination achieving 100%and 99.86%accuracy in identifying hyperthyroidism and hypothyroidism,respectively.Beyond enhancing detection accuracy,the ensemble stacking model also demonstrated a streamlined computational complexity which is pivotal for practical medical applications.It significantly outperformed existing studies with similar objectives underscoring the viability and effectiveness of the proposed scheme.This research offers an innovative perspective and sets the platform for improved thyroid disorder diagnosis with broader implications for healthcare and patient well-being.展开更多
The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and t...The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and the corresponding ground-state spins as labels or output predictions.The quantum many-body system problem exceeds the capability of our optimized NNs in terms of accurately predicting the ground-state spin of each sample within the TBRE.However,our NN model effectively captured the statistical properties of the ground-state spin because it learned the empirical regularity of the ground-state spin distribution in TBRE,as discovered by physicists.展开更多
The Indian Himalayan region is frequently experiencing climate change-induced landslides.Thus,landslide susceptibility assessment assumes greater significance for lessening the impact of a landslide hazard.This paper ...The Indian Himalayan region is frequently experiencing climate change-induced landslides.Thus,landslide susceptibility assessment assumes greater significance for lessening the impact of a landslide hazard.This paper makes an attempt to assess landslide susceptibility in Shimla district of the northwest Indian Himalayan region.It examined the effectiveness of random forest(RF),multilayer perceptron(MLP),sequential minimal optimization regression(SMOreg)and bagging ensemble(B-RF,BSMOreg,B-MLP)models.A landslide inventory map comprising 1052 locations of past landslide occurrences was classified into training(70%)and testing(30%)datasets.The site-specific influencing factors were selected by employing a multicollinearity test.The relationship between past landslide occurrences and influencing factors was established using the frequency ratio method.The effectiveness of machine learning models was verified through performance assessors.The landslide susceptibility maps were validated by the area under the receiver operating characteristic curves(ROC-AUC),accuracy,precision,recall and F1-score.The key performance metrics and map validation demonstrated that the BRF model(correlation coefficient:0.988,mean absolute error:0.010,root mean square error:0.058,relative absolute error:2.964,ROC-AUC:0.947,accuracy:0.778,precision:0.819,recall:0.917 and F-1 score:0.865)outperformed the single classifiers and other bagging ensemble models for landslide susceptibility.The results show that the largest area was found under the very high susceptibility zone(33.87%),followed by the low(27.30%),high(20.68%)and moderate(18.16%)susceptibility zones.The factors,namely average annual rainfall,slope,lithology,soil texture and earthquake magnitude have been identified as the influencing factors for very high landslide susceptibility.Soil texture,lineament density and elevation have been attributed to high and moderate susceptibility.Thus,the study calls for devising suitable landslide mitigation measures in the study area.Structural measures,an immediate response system,community participation and coordination among stakeholders may help lessen the detrimental impact of landslides.The findings from this study could aid decision-makers in mitigating future catastrophes and devising suitable strategies in other geographical regions with similar geological characteristics.展开更多
This study investigates the application of deep learning,ensemble learning,metaheuristic optimization,and image processing techniques for detecting lung and colon cancers,aiming to enhance treatment efficacy and impro...This study investigates the application of deep learning,ensemble learning,metaheuristic optimization,and image processing techniques for detecting lung and colon cancers,aiming to enhance treatment efficacy and improve survival rates.We introduce a metaheuristic-driven two-stage ensemble deep learning model for efficient lung/colon cancer classification.The diagnosis of lung and colon cancers is attempted using several unique indicators by different versions of deep Convolutional Neural Networks(CNNs)in feature extraction and model constructions,and utilizing the power of various Machine Learning(ML)algorithms for final classification.Specifically,we consider different scenarios consisting of two-class colon cancer,three-class lung cancer,and fiveclass combined lung/colon cancer to conduct feature extraction using four CNNs.These extracted features are then integrated to create a comprehensive feature set.In the next step,the optimization of the feature selection is conducted using a metaheuristic algorithm based on the Electric Eel Foraging Optimization(EEFO).This optimized feature subset is subsequently employed in various ML algorithms to determine the most effective ones through a rigorous evaluation process.The top-performing algorithms are refined using the High-Performance Filter(HPF)and integrated into an ensemble learning framework employing weighted averaging.Our findings indicate that the proposed ensemble learning model significantly surpasses existing methods in classification accuracy across all datasets,achieving accuracies of 99.85%for the two-class,98.70%for the three-class,and 98.96%for the five-class datasets.展开更多
This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols w...This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols with a well-balanced rhythm and suitable layout.The model,called Stacked Generalization for Channel Estimation(SGCE),aims to enhance channel estimation performance by eliminating pilot insertion and improving throughput.The SGCE model incorporates six machine learning methods:random forest(RF),gradient boosting machine(GB),light gradient boosting machine(LGBM),support vector regression(SVR),extremely randomized tree(ERT),and extreme gradient boosting(XGB).By generating meta-data from five models(RF,GB,LGBM,SVR,and ERT),we ensure accurate channel coefficient predictions using the XGB model.To validate themodeling performance,we employ the leave-one-out cross-validation(LOOCV)approach,where each observation serves as the validation set while the remaining observations act as the training set.SGCE performances’results demonstrate higher mean andmedian accuracy compared to the separatedmodel.SGCE achieves an average accuracy of 98.4%,precision of 98.1%,and the highest F1-score of 98.5%,accurately predicting channel coefficients.Furthermore,our proposedmethod outperforms prior traditional and intelligent techniques in terms of throughput and bit error rate.SGCE’s superior performance highlights its efficacy in optimizing channel estimation.It can effectively predict channel coefficients and contribute to enhancing the overall efficiency of radio mobile systems.Through extensive experimentation and evaluation,we demonstrate that SGCE improved performance in channel estimation,surpassing previous techniques.Accordingly,SGCE’s capabilities have significant implications for optimizing channel estimation in modern communication systems.展开更多
This study investigated the growth of forecast errors stemming from initial conditions(ICs),lateral boundary conditions(LBCs),and model(MO)perturbations,as well as their interactions,by conducting seven 36 h convectio...This study investigated the growth of forecast errors stemming from initial conditions(ICs),lateral boundary conditions(LBCs),and model(MO)perturbations,as well as their interactions,by conducting seven 36 h convectionallowing ensemble forecast(CAEF)experiments.Two cases,one with strong-forcing(SF)and the other with weak-forcing(WF),occurred over the Yangtze-Huai River basin(YHRB)in East China,were selected to examine the sources of uncertainties associated with perturbation growth under varying forcing backgrounds and the influence of these backgrounds on growth.The perturbations exhibited distinct characteristics in terms of temporal evolution,spatial propagation,and vertical distribution under different forcing backgrounds,indicating a dependence between perturbation growth and forcing background.A comparison of the perturbation growth in different precipitation areas revealed that IC and LBC perturbations were significantly influenced by the location of precipitation in the SF case,while MO perturbations were more responsive to convection triggering and dominated in the WF case.The vertical distribution of perturbations showed that the sources of uncertainties and the performance of perturbations varied between SF and WF cases,with LBC perturbations displaying notable case dependence.Furthermore,the interactions between perturbations were considered by exploring the added values of different source perturbations.For the SF case,the added values of IC,LBC,and MO perturbations were reflected in different forecast periods and different source uncertainties,suggesting that the combination of multi-source perturbations can yield positive interactions.In the WF case,MO perturbations provided a more accurate estimation of uncertainties downstream of the Dabie Mountain and need to be prioritized in the research on perturbation development.展开更多
基金supported by the National Natural Science Foundation of China (Project No.42375192)the China Meteorological Administration Climate Change Special Program (CMA-CCSP+1 种基金Project No.QBZ202315)support by the Vector Stiftung through the Young Investigator Group"Artificial Intelligence for Probabilistic Weather Forecasting."
文摘Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantile regression(QR)is highly competitive in terms of both flexibility and predictive performance.Nevertheless,a long-standing problem of QR is quantile crossing,which greatly limits the interpretability of QR-calibrated forecasts.On this point,this study proposes a non-crossing quantile regression neural network(NCQRNN),for calibrating ensemble NWP forecasts into a set of reliable quantile forecasts without crossing.The overarching design principle of NCQRNN is to add on top of the conventional QRNN structure another hidden layer,which imposes a non-decreasing mapping between the combined output from nodes of the last hidden layer to the nodes of the output layer,through a triangular weight matrix with positive entries.The empirical part of the work considers a solar irradiance case study,in which four years of ensemble irradiance forecasts at seven locations,issued by the European Centre for Medium-Range Weather Forecasts,are calibrated via NCQRNN,as well as via an eclectic mix of benchmarking models,ranging from the naïve climatology to the state-of-the-art deep-learning and other non-crossing models.Formal and stringent forecast verification suggests that the forecasts post-processed via NCQRNN attain the maximum sharpness subject to calibration,amongst all competitors.Furthermore,the proposed conception to resolve quantile crossing is remarkably simple yet general,and thus has broad applicability as it can be integrated with many shallow-and deep-learning-based neural networks.
基金funded by the National Key R&D Program of China(2021YFD1400200)the Taishan Scholar Constructive Engineering Foundation of Shandong,China(tstp20221135)。
文摘Potato cyst nematodes(PCNs)are a significant threat to potato production,having caused substantial damage in many countries.Predicting the future distribution of PCN species is crucial to implementing effective biosecurity strategies,especially given the impact of climate change on pest species invasion and distribution.Machine learning(ML),specifically ensemble models,has emerged as a powerful tool in predicting species distributions due to its ability to learn and make predictions based on complex data sets.Thus,this research utilised advanced machine learning techniques to predict the distribution of PCN species under climate change conditions,providing the initial element for invasion risk assessment.We first used Global Climate Models to generate homogeneous climate predictors to mitigate the variation among predictors.Then,five machine learning models were employed to build two groups of ensembles,single-algorithm ensembles(ESA)and multi-algorithm ensembles(EMA),and compared their performances.In this research,the EMA did not always perform better than the ESA,and the ESA of Artificial Neural Network gave the highest performance while being cost-effective.Prediction results indicated that the distribution range of PCNs would shift northward with a decrease in tropical zones and an increase in northern latitudes.However,the total area of suitable regions will not change significantly,occupying 16-20%of the total land surface(18%under current conditions).This research alerts policymakers and practitioners to the risk of PCNs’incursion into new regions.Additionally,this ML process offers the capability to track changes in the distribution of various species and provides scientifically grounded evidence for formulating long-term biosecurity plans for their control.
基金supported by the National Natural Science Foundation of China(61503204)the Natural Science Foundation of Zhejiang Province(Y16F030001)the Nature Science Foundation of Ningbo City(2016A610092).
文摘Dear Editor,This letter presents a novel process monitoring model based on ensemble structure analysis(ESA).The ESA model takes advantage of principal component analysis(PCA),locality preserving projections(LPP),and multi-manifold projections(MMP)models,and then combines the multiple solutions within an ensemble result through Bayesian inference.In the developed ESA model,different structure features of the given dataset are taken into account simultaneously,the suitability and reliability of the ESA-based monitoring model are then illustrated through comparison.Introduction:The requirement for ensuring safe operation and improving process efficiency has led to increased research activity in the field of process monitoring.
文摘Identifying rare patterns for medical diagnosis is a challenging task due to heterogeneity and the volume of data.Data summarization can create a concise version of the original data that can be used for effective diagnosis.In this paper,we propose an ensemble summarization method that combines clustering and sampling to create a summary of the original data to ensure the inclusion of rare patterns.To the best of our knowledge,there has been no such technique available to augment the performance of anomaly detection techniques and simultaneously increase the efficiency of medical diagnosis.The performance of popular anomaly detection algorithms increases significantly in terms of accuracy and computational complexity when the summaries are used.Therefore,the medical diagnosis becomes more effective,and our experimental results reflect that the combination of the proposed summarization scheme and all underlying algorithms used in this paper outperforms the most popular anomaly detection techniques.
基金supported by the National Natural Science Foundation of China (Grant No. 12234013)the Natural Science Foundation of Shandong Province (Grant No. ZR2021LLZ009)。
文摘We present a formulation of the single-trajectory entropy using the trajectories ensemble. The single-trajectory entropy is affected by its surrounding trajectories via the distribution function. The single-trajectory entropies are studied in two typical potentials, i.e., harmonic potential and double-well potential, and in viscous environment by interacting trajectory method. The results of the trajectory methods are in agreement well with the numerical methods(Monte Carlo simulation and difference equation). The single-trajectory entropies increasing(decreasing) could be caused by absorption(emission) heat from(to) the thermal environment. Also, some interesting trajectories, which correspond to the rare evens in the processes, are demonstrated.
基金Project supported by the National Key Research and Development Program of China (Grant No.2021YFB3900701)the Science and Technology Plan Project of the State Administration for Market Regulation of China (Grant No.2023MK178)the National Natural Science Foundation of China (Grant No.42227802)。
文摘A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.
基金the Deanship of Scientific Research,Najran University,Kingdom of Saudi Arabia,for funding this work under the Research Groups Funding Program Grant Code Number(NU/RG/SERC/12/43).
文摘Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and everpresent threat is Ransomware-as-a-Service(RaaS)assaults,which enable even individuals with minimal technical knowledge to conduct ransomware operations.This study provides a new approach for RaaS attack detection which uses an ensemble of deep learning models.For this purpose,the network intrusion detection dataset“UNSWNB15”from the Intelligent Security Group of the University of New South Wales,Australia is analyzed.In the initial phase,the rectified linear unit-,scaled exponential linear unit-,and exponential linear unit-based three separate Multi-Layer Perceptron(MLP)models are developed.Later,using the combined predictive power of these three MLPs,the RansoDetect Fusion ensemble model is introduced in the suggested methodology.The proposed ensemble technique outperforms previous studieswith impressive performance metrics results,including 98.79%accuracy and recall,98.85%precision,and 98.80%F1-score.The empirical results of this study validate the ensemble model’s ability to improve cybersecurity defenses by showing that it outperforms individual MLPmodels.In expanding the field of cybersecurity strategy,this research highlights the significance of combined deep learning models in strengthening intrusion detection systems against sophisticated cyber threats.
基金supported by the research project—Application of Machine Learning Methods for Early Diagnosis of Pathologies of the Cardiovascular System funded by the Ministry of Science and Higher Education of the Republic of Kazakhstan.Grant No.IRN AP13068289.
文摘This research introduces an innovative ensemble approach,combining Deep Residual Networks(ResNets)and Bidirectional Gated Recurrent Units(BiGRU),augmented with an Attention Mechanism,for the classification of heart arrhythmias.The escalating prevalence of cardiovascular diseases necessitates advanced diagnostic tools to enhance accuracy and efficiency.The model leverages the deep hierarchical feature extraction capabilities of ResNets,which are adept at identifying intricate patterns within electrocardiogram(ECG)data,while BiGRU layers capture the temporal dynamics essential for understanding the sequential nature of ECG signals.The integration of an Attention Mechanism refines the model’s focus on critical segments of ECG data,ensuring a nuanced analysis that highlights the most informative features for arrhythmia classification.Evaluated on a comprehensive dataset of 12-lead ECG recordings,our ensemble model demonstrates superior performance in distinguishing between various types of arrhythmias,with an accuracy of 98.4%,a precision of 98.1%,a recall of 98%,and an F-score of 98%.This novel combination of convolutional and recurrent neural networks,supplemented by attention-driven mechanisms,advances automated ECG analysis,contributing significantly to healthcare’s machine learning applications and presenting a step forward in developing non-invasive,efficient,and reliable tools for early diagnosis and management of heart diseases.
基金This work was supported by Natural Science Foundation of China(Nos.62303126,62362008,62066006,authors Zhenyong Zhang and Bin Hu,https://www.nsfc.gov.cn/,accessed on 25 July 2024)Guizhou Provincial Science and Technology Projects(No.ZK[2022]149,author Zhenyong Zhang,https://kjt.guizhou.gov.cn/,accessed on 25 July 2024)+1 种基金Guizhou Provincial Research Project(Youth)forUniversities(No.[2022]104,author Zhenyong Zhang,https://jyt.guizhou.gov.cn/,accessed on 25 July 2024)GZU Cultivation Project of NSFC(No.[2020]80,author Zhenyong Zhang,https://www.gzu.edu.cn/,accessed on 25 July 2024).
文摘With the widespread use of machine learning(ML)technology,the operational efficiency and responsiveness of power grids have been significantly enhanced,allowing smart grids to achieve high levels of automation and intelligence.However,tree ensemble models commonly used in smart grids are vulnerable to adversarial attacks,making it urgent to enhance their robustness.To address this,we propose a robustness enhancement method that incorporates physical constraints into the node-splitting decisions of tree ensembles.Our algorithm improves robustness by developing a dataset of adversarial examples that comply with physical laws,ensuring training data accurately reflects possible attack scenarios while adhering to physical rules.In our experiments,the proposed method increased robustness against adversarial attacks by 100%when applied to real grid data under physical constraints.These results highlight the advantages of our method in maintaining efficient and secure operation of smart grids under adversarial conditions.
基金supported by the Joint Funds of the Chinese National Natural Science Foundation (NSFC)(Grant No.U2242213)the National Key Research and Development (R&D)Program of the Ministry of Science and Technology of China(Grant No. 2021YFC3000902)the National Science Foundation for Young Scholars (Grant No. 42205166)。
文摘Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial perturbation method tends only to capture synoptic scale initial uncertainty rather than mesoscale uncertainty in global ensemble prediction. To address this issue, a multiscale SV initial perturbation method based on the China Meteorological Administration Global Ensemble Prediction System(CMA-GEPS) is proposed to quantify multiscale initial uncertainty. The multiscale SV initial perturbation approach entails calculating multiscale SVs at different resolutions with multiple linearized physical processes to capture fast-growing perturbations from mesoscale to synoptic scale in target areas and combining these SVs by using a Gaussian sampling method with amplitude coefficients to generate initial perturbations. Following that, the energy norm,energy spectrum, and structure of multiscale SVs and their impact on GEPS are analyzed based on a batch experiment in different seasons. The results show that the multiscale SV initial perturbations can possess more energy and capture more mesoscale uncertainties than the traditional single-SV method. Meanwhile, multiscale SV initial perturbations can reflect the strongest dynamical instability in target areas. Their performances in global ensemble prediction when compared to single-scale SVs are shown to(i) improve the relationship between the ensemble spread and the root-mean-square error and(ii) provide a better probability forecast skill for atmospheric circulation during the late forecast period and for short-to medium-range precipitation. This study provides scientific evidence and application foundations for the design and development of a multiscale SV initial perturbation method for the GEPS.
基金This work is supported by EIAS(Emerging Intelligent Autonomous Systems)Data Science Lab,Prince Sultan University,Kingdom of Saudi Arabia,by paying the APC.
文摘The software development process mostly depends on accurately identifying both essential and optional features.Initially,user needs are typically expressed in free-form language,requiring significant time and human resources to translate these into clear functional and non-functional requirements.To address this challenge,various machine learning(ML)methods have been explored to automate the understanding of these requirements,aiming to reduce time and human effort.However,existing techniques often struggle with complex instructions and large-scale projects.In our study,we introduce an innovative approach known as the Functional and Non-functional Requirements Classifier(FNRC).By combining the traditional random forest algorithm with the Accuracy Sliding Window(ASW)technique,we develop optimal sub-ensembles that surpass the initial classifier’s accuracy while using fewer trees.Experimental results demonstrate that our FNRC methodology performs robustly across different datasets,achieving a balanced Precision of 75%on the PROMISE dataset and an impressive Recall of 85%on the CCHIT dataset.Both datasets consistently maintain an F-measure around 64%,highlighting FNRC’s ability to effectively balance precision and recall in diverse scenarios.These findings contribute to more accurate and efficient software development processes,increasing the probability of achieving successful project outcomes.
基金great gratitude to National Key Research and Development Project(Grant No.2019YFC1509800)for their financial supportNational Nature Science Foundation of China(Grant No.12172211)for their financial support.
文摘Geotechnical engineering data are usually small-sample and high-dimensional,which brings a lot of challenges in predictive modeling.This paper uses a typical high-dimensional and small-sample swell pressure(P_(s))dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction.Based on six machine learning(ML)algorithms,the base learner pool is constructed,and four ensemble methods,Stacking(SG),Blending(BG),Voting regression(VR),and Feature weight linear stacking(FWL),are used for the multi-algorithm ensemble.Furthermore,the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling.The results show that the proposed methods are superior to traditional prediction models and base ML models,where FWL is more suitable for modeling with small-sample datasets,and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect,which points the way to feature selection for predictive modeling.Based on the ensemble methods,the feature importance of the five primary factors affecting P_(s) is the maximum dry density(31.145%),clay fraction(15.876%),swell percent(15.289%),plasticity index(14%),and optimum moisture content(13.69%),the influence of input parameters on P_(s) is also investigated,in line with the findings of the existing literature.
基金supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant Funded by the Korean government(MSIT)(2021-0-00755,Dark Data Analysis Technology for Data Scale and Accuracy Improvement)This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R407)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care.This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques.Sequential forward feature selection,sequential backward feature elimination,and bidirectional feature elimination are investigated in this study.In ensemble learning,random forest,adaptive boosting,and bagging classifiers are employed.The effectiveness of these techniques is evaluated using two different datasets obtained from the University of California Irvine-Machine Learning Repository,both of which undergo preprocessing steps,including outlier removal,addressing missing data,data cleansing,and feature reduction.Extensive experimentation demonstrates the remarkable success of proposed ensemble stacking and bidirectional feature elimination achieving 100%and 99.86%accuracy in identifying hyperthyroidism and hypothyroidism,respectively.Beyond enhancing detection accuracy,the ensemble stacking model also demonstrated a streamlined computational complexity which is pivotal for practical medical applications.It significantly outperformed existing studies with similar objectives underscoring the viability and effectiveness of the proposed scheme.This research offers an innovative perspective and sets the platform for improved thyroid disorder diagnosis with broader implications for healthcare and patient well-being.
基金supported by the National Natural Science Foundation of China Youth Fund(12105234)。
文摘The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and the corresponding ground-state spins as labels or output predictions.The quantum many-body system problem exceeds the capability of our optimized NNs in terms of accurately predicting the ground-state spin of each sample within the TBRE.However,our NN model effectively captured the statistical properties of the ground-state spin because it learned the empirical regularity of the ground-state spin distribution in TBRE,as discovered by physicists.
文摘The Indian Himalayan region is frequently experiencing climate change-induced landslides.Thus,landslide susceptibility assessment assumes greater significance for lessening the impact of a landslide hazard.This paper makes an attempt to assess landslide susceptibility in Shimla district of the northwest Indian Himalayan region.It examined the effectiveness of random forest(RF),multilayer perceptron(MLP),sequential minimal optimization regression(SMOreg)and bagging ensemble(B-RF,BSMOreg,B-MLP)models.A landslide inventory map comprising 1052 locations of past landslide occurrences was classified into training(70%)and testing(30%)datasets.The site-specific influencing factors were selected by employing a multicollinearity test.The relationship between past landslide occurrences and influencing factors was established using the frequency ratio method.The effectiveness of machine learning models was verified through performance assessors.The landslide susceptibility maps were validated by the area under the receiver operating characteristic curves(ROC-AUC),accuracy,precision,recall and F1-score.The key performance metrics and map validation demonstrated that the BRF model(correlation coefficient:0.988,mean absolute error:0.010,root mean square error:0.058,relative absolute error:2.964,ROC-AUC:0.947,accuracy:0.778,precision:0.819,recall:0.917 and F-1 score:0.865)outperformed the single classifiers and other bagging ensemble models for landslide susceptibility.The results show that the largest area was found under the very high susceptibility zone(33.87%),followed by the low(27.30%),high(20.68%)and moderate(18.16%)susceptibility zones.The factors,namely average annual rainfall,slope,lithology,soil texture and earthquake magnitude have been identified as the influencing factors for very high landslide susceptibility.Soil texture,lineament density and elevation have been attributed to high and moderate susceptibility.Thus,the study calls for devising suitable landslide mitigation measures in the study area.Structural measures,an immediate response system,community participation and coordination among stakeholders may help lessen the detrimental impact of landslides.The findings from this study could aid decision-makers in mitigating future catastrophes and devising suitable strategies in other geographical regions with similar geological characteristics.
文摘This study investigates the application of deep learning,ensemble learning,metaheuristic optimization,and image processing techniques for detecting lung and colon cancers,aiming to enhance treatment efficacy and improve survival rates.We introduce a metaheuristic-driven two-stage ensemble deep learning model for efficient lung/colon cancer classification.The diagnosis of lung and colon cancers is attempted using several unique indicators by different versions of deep Convolutional Neural Networks(CNNs)in feature extraction and model constructions,and utilizing the power of various Machine Learning(ML)algorithms for final classification.Specifically,we consider different scenarios consisting of two-class colon cancer,three-class lung cancer,and fiveclass combined lung/colon cancer to conduct feature extraction using four CNNs.These extracted features are then integrated to create a comprehensive feature set.In the next step,the optimization of the feature selection is conducted using a metaheuristic algorithm based on the Electric Eel Foraging Optimization(EEFO).This optimized feature subset is subsequently employed in various ML algorithms to determine the most effective ones through a rigorous evaluation process.The top-performing algorithms are refined using the High-Performance Filter(HPF)and integrated into an ensemble learning framework employing weighted averaging.Our findings indicate that the proposed ensemble learning model significantly surpasses existing methods in classification accuracy across all datasets,achieving accuracies of 99.85%for the two-class,98.70%for the three-class,and 98.96%for the five-class datasets.
基金This research project was funded by the Deanship of Scientific Research,Princess Nourah bint Abdulrahman University,through the Program of Research Project Funding After Publication,grant No(43-PRFA-P-58).
文摘This study presents a layered generalization ensemble model for next generation radio mobiles,focusing on supervised channel estimation approaches.Channel estimation typically involves the insertion of pilot symbols with a well-balanced rhythm and suitable layout.The model,called Stacked Generalization for Channel Estimation(SGCE),aims to enhance channel estimation performance by eliminating pilot insertion and improving throughput.The SGCE model incorporates six machine learning methods:random forest(RF),gradient boosting machine(GB),light gradient boosting machine(LGBM),support vector regression(SVR),extremely randomized tree(ERT),and extreme gradient boosting(XGB).By generating meta-data from five models(RF,GB,LGBM,SVR,and ERT),we ensure accurate channel coefficient predictions using the XGB model.To validate themodeling performance,we employ the leave-one-out cross-validation(LOOCV)approach,where each observation serves as the validation set while the remaining observations act as the training set.SGCE performances’results demonstrate higher mean andmedian accuracy compared to the separatedmodel.SGCE achieves an average accuracy of 98.4%,precision of 98.1%,and the highest F1-score of 98.5%,accurately predicting channel coefficients.Furthermore,our proposedmethod outperforms prior traditional and intelligent techniques in terms of throughput and bit error rate.SGCE’s superior performance highlights its efficacy in optimizing channel estimation.It can effectively predict channel coefficients and contribute to enhancing the overall efficiency of radio mobile systems.Through extensive experimentation and evaluation,we demonstrate that SGCE improved performance in channel estimation,surpassing previous techniques.Accordingly,SGCE’s capabilities have significant implications for optimizing channel estimation in modern communication systems.
基金Key Project of the National Natural Science Foundation of China (42330611)National Natural Science Foundation of China (42105008)。
文摘This study investigated the growth of forecast errors stemming from initial conditions(ICs),lateral boundary conditions(LBCs),and model(MO)perturbations,as well as their interactions,by conducting seven 36 h convectionallowing ensemble forecast(CAEF)experiments.Two cases,one with strong-forcing(SF)and the other with weak-forcing(WF),occurred over the Yangtze-Huai River basin(YHRB)in East China,were selected to examine the sources of uncertainties associated with perturbation growth under varying forcing backgrounds and the influence of these backgrounds on growth.The perturbations exhibited distinct characteristics in terms of temporal evolution,spatial propagation,and vertical distribution under different forcing backgrounds,indicating a dependence between perturbation growth and forcing background.A comparison of the perturbation growth in different precipitation areas revealed that IC and LBC perturbations were significantly influenced by the location of precipitation in the SF case,while MO perturbations were more responsive to convection triggering and dominated in the WF case.The vertical distribution of perturbations showed that the sources of uncertainties and the performance of perturbations varied between SF and WF cases,with LBC perturbations displaying notable case dependence.Furthermore,the interactions between perturbations were considered by exploring the added values of different source perturbations.For the SF case,the added values of IC,LBC,and MO perturbations were reflected in different forecast periods and different source uncertainties,suggesting that the combination of multi-source perturbations can yield positive interactions.In the WF case,MO perturbations provided a more accurate estimation of uncertainties downstream of the Dabie Mountain and need to be prioritized in the research on perturbation development.