Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and ...Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and everpresent threat is Ransomware-as-a-Service(RaaS)assaults,which enable even individuals with minimal technical knowledge to conduct ransomware operations.This study provides a new approach for RaaS attack detection which uses an ensemble of deep learning models.For this purpose,the network intrusion detection dataset“UNSWNB15”from the Intelligent Security Group of the University of New South Wales,Australia is analyzed.In the initial phase,the rectified linear unit-,scaled exponential linear unit-,and exponential linear unit-based three separate Multi-Layer Perceptron(MLP)models are developed.Later,using the combined predictive power of these three MLPs,the RansoDetect Fusion ensemble model is introduced in the suggested methodology.The proposed ensemble technique outperforms previous studieswith impressive performance metrics results,including 98.79%accuracy and recall,98.85%precision,and 98.80%F1-score.The empirical results of this study validate the ensemble model’s ability to improve cybersecurity defenses by showing that it outperforms individual MLPmodels.In expanding the field of cybersecurity strategy,this research highlights the significance of combined deep learning models in strengthening intrusion detection systems against sophisticated cyber threats.展开更多
A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble...A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.展开更多
Geotechnical engineering data are usually small-sample and high-dimensional,which brings a lot of challenges in predictive modeling.This paper uses a typical high-dimensional and small-sample swell pressure(P s)datase...Geotechnical engineering data are usually small-sample and high-dimensional,which brings a lot of challenges in predictive modeling.This paper uses a typical high-dimensional and small-sample swell pressure(P s)dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction.Based on six machine learning(ML)algorithms,the base learner pool is constructed,and four ensemble methods,Stacking(SG),Blending(BG),Voting regression(VR),and Feature weight linear stacking(FWL),are used for the multi-algorithm ensemble.Furthermore,the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling.The results show that the proposed methods are superior to traditional prediction models and base ML models,where FWL is more suitable for modeling with small-sample datasets,and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect,which points the way to feature selection for predictive modeling.Based on the ensemble methods,the feature importance of the five primary factors affecting P s is the maximum dry density(31.145%),clay fraction(15.876%),swell percent(15.289%),plasticity index(14%),and optimum moisture content(13.69%),the influence of input parameters on P s is also investigated,in line with the findings of the existing literature.展开更多
Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these d...Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care.This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques.Sequential forward feature selection,sequential backward feature elimination,and bidirectional feature elimination are investigated in this study.In ensemble learning,random forest,adaptive boosting,and bagging classifiers are employed.The effectiveness of these techniques is evaluated using two different datasets obtained from the University of California Irvine-Machine Learning Repository,both of which undergo preprocessing steps,including outlier removal,addressing missing data,data cleansing,and feature reduction.Extensive experimentation demonstrates the remarkable success of proposed ensemble stacking and bidirectional feature elimination achieving 100%and 99.86%accuracy in identifying hyperthyroidism and hypothyroidism,respectively.Beyond enhancing detection accuracy,the ensemble stacking model also demonstrated a streamlined computational complexity which is pivotal for practical medical applications.It significantly outperformed existing studies with similar objectives underscoring the viability and effectiveness of the proposed scheme.This research offers an innovative perspective and sets the platform for improved thyroid disorder diagnosis with broader implications for healthcare and patient well-being.展开更多
Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial pertur...Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial perturbation method tends only to capture synoptic scale initial uncertainty rather than mesoscale uncertainty in global ensemble prediction. To address this issue, a multiscale SV initial perturbation method based on the China Meteorological Administration Global Ensemble Prediction System(CMA-GEPS) is proposed to quantify multiscale initial uncertainty. The multiscale SV initial perturbation approach entails calculating multiscale SVs at different resolutions with multiple linearized physical processes to capture fast-growing perturbations from mesoscale to synoptic scale in target areas and combining these SVs by using a Gaussian sampling method with amplitude coefficients to generate initial perturbations. Following that, the energy norm,energy spectrum, and structure of multiscale SVs and their impact on GEPS are analyzed based on a batch experiment in different seasons. The results show that the multiscale SV initial perturbations can possess more energy and capture more mesoscale uncertainties than the traditional single-SV method. Meanwhile, multiscale SV initial perturbations can reflect the strongest dynamical instability in target areas. Their performances in global ensemble prediction when compared to single-scale SVs are shown to(i) improve the relationship between the ensemble spread and the root-mean-square error and(ii) provide a better probability forecast skill for atmospheric circulation during the late forecast period and for short-to medium-range precipitation. This study provides scientific evidence and application foundations for the design and development of a multiscale SV initial perturbation method for the GEPS.展开更多
Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article...Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article presentsa novel approach for hybrid ensemble learning that is based on rigorous requirements engineering concepts.The approach finds significant parameters influencing forecasting accuracy by evaluating real-time Modern-EraRetrospective Analysis for Research and Applications (MERRA2) data from several European Wind farms usingin-depth stakeholder research and requirements elicitation. Ensemble learning is used to develop a robust model,while a temporal convolutional network handles time-series complexities and data gaps. The ensemble-temporalneural network is enhanced by providing different input parameters including training layers, hidden and dropoutlayers along with activation and loss functions. The proposed framework is further analyzed by comparing stateof-the-art forecasting models in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE),respectively. The energy efficiency performance indicators showed that the proposed model demonstrates errorreduction percentages of approximately 16.67%, 28.57%, and 81.92% for MAE, and 38.46%, 17.65%, and 90.78%for RMSE for MERRAWind farms 1, 2, and 3, respectively, compared to other existingmethods. These quantitativeresults show the effectiveness of our proposed model with MAE values ranging from 0.0010 to 0.0156 and RMSEvalues ranging from 0.0014 to 0.0174. This work highlights the effectiveness of requirements engineering in windpower forecasting, leading to enhanced forecast accuracy and grid stability, ultimately paving the way for moresustainable energy solutions.展开更多
With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning ...With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.展开更多
The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and t...The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and the corresponding ground-state spins as labels or output predictions.The quantum many-body system problem exceeds the capability of our optimized NNs in terms of accurately predicting the ground-state spin of each sample within the TBRE.However,our NN model effectively captured the statistical properties of the ground-state spin because it learned the empirical regularity of the ground-state spin distribution in TBRE,as discovered by physicists.展开更多
Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requir...Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.展开更多
Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs label...Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.展开更多
As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single mac...As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.展开更多
Deep neural networks have achieved tremendous success in various fields,and the structure of these networks is a key factor in their success.In this paper,we focus on the research of ensemble learning based on deep ne...Deep neural networks have achieved tremendous success in various fields,and the structure of these networks is a key factor in their success.In this paper,we focus on the research of ensemble learning based on deep network structure and propose a new deep network ensemble framework(DNEF).Unlike other ensemble learning models,DNEF is an ensemble learning architecture of network structures,with serial iteration between the hidden layers,while base classifiers are trained in parallel within these hidden layers.Specifically,DNEF uses randomly sampled data as input and implements serial iteration based on the weighting strategy between hidden layers.In the hidden layers,each node represents a base classifier,and multiple nodes generate training data for the next hidden layer according to the transfer strategy.The DNEF operates based on two strategies:(1)The weighting strategy calculates the training instance weights of the nodes according to their weaknesses in the previous layer.(2)The transfer strategy adaptively selects each node’s instances with weights as transfer instances and transfer weights,which are combined with the training data of nodes as input for the next hidden layer.These two strategies improve the accuracy and generalization of DNEF.This research integrates the ensemble of all nodes as the final output of DNEF.The experimental results reveal that the DNEF framework surpasses the traditional ensemble models and functions with high accuracy and innovative deep ensemble methods.展开更多
Based on a simple coupled Lorenz model,we investigate how to assess a suitable initial perturbation scheme for ensemble forecasting in a multiscale system involving slow dynamics and fast dynamics.Four initial perturb...Based on a simple coupled Lorenz model,we investigate how to assess a suitable initial perturbation scheme for ensemble forecasting in a multiscale system involving slow dynamics and fast dynamics.Four initial perturbation approaches are used in the ensemble forecasting experiments:the random perturbation(RP),the bred vector(BV),the ensemble transform Kalman filter(ETKF),and the nonlinear local Lyapunov vector(NLLV)methods.Results show that,regardless of the method used,the ensemble averages behave indistinguishably from the control forecasts during the first few time steps.Due to different error growth in different time-scale systems,the ensemble averages perform better than the control forecast after very short lead times in a fast subsystem but after a relatively long period of time in a slow subsystem.Due to the coupled dynamic processes,the addition of perturbations to fast variables or to slow variables can contribute to an improvement in the forecasting skill for fast variables and slow variables.Regarding the initial perturbation approaches,the NLLVs show higher forecasting skill than the BVs or RPs overall.The NLLVs and ETKFs had nearly equivalent prediction skill,but NLLVs performed best by a narrow margin.In particular,when adding perturbations to slow variables,the independent perturbations(NLLVs and ETKFs)perform much better in ensemble prediction.These results are simply implied in a real coupled air–sea model.For the prediction of oceanic variables,using independent perturbations(NLLVs)and adding perturbations to oceanic variables are expected to result in better performance in the ensemble prediction.展开更多
A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on...A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on the dimension of predicted probability, and the pruning algorithm based on greedy forward search is obtained by combining the two indicators of accuracy and complementarity.Then the pruning algorithm is integrated into the Stacking ensemble method to establish a user online shopping behavior prediction model based on the probabilistic multi-dimensional selective ensemble method.Finally, the research method is compared with the prediction results of individual learners in ensemble learning and the Stacking ensemble method without pruning.The experimental results show that the proposed method can reduce the scale of integration, improve the prediction accuracy of the model, and predict the user's online purchase behavior.展开更多
Internet of things(IOT)possess cultural,commercial and social effect in life in the future.The nodes which are participating in IOT network are basi-cally attracted by the cyber-attack targets.Attack and identification...Internet of things(IOT)possess cultural,commercial and social effect in life in the future.The nodes which are participating in IOT network are basi-cally attracted by the cyber-attack targets.Attack and identification of anomalies in IoT infrastructure is a growing problem in the IoT domain.Machine Learning Based Ensemble Intrusion Detection(MLEID)method is applied in order to resolve the drawback by minimizing malicious actions in related botnet attacks on Message Queue Telemetry Transport(MQTT)and Hyper-Text Transfer Proto-col(HTTP)protocols.The proposed work has two significant contributions which are a selection of features and detection of attacks.New features are chosen from Improved Ant Colony Optimization(IACO)in the feature selection,and then the detection of attacks is carried out based on a combination of their possible proper-ties.The IACO approach is focused on defining the attacker’s important features against HTTP and MQTT.In the IACO algorithm,the constant factor is calculated against HTTP and MQTT based on the mean function for each element.Attack detection,the performance of several machine learning models are Distance Deci-sion Tree(DDT),Adaptive Neuro-Fuzzy Inference System(ANFIS)and Mahala-nobis Distance Support Vector Machine(MDSVM)were compared with predicting accurate attacks on the IoT network.The outcomes of these classifiers are combined into the ensemble model.The proposed MLEID strategy has effec-tively established malicious incidents.The UNSW-NB15 dataset is used to test the MLEID technique using data from simulated IoT sensors.Besides,the pro-posed MLEID technique has a greater detection rate and an inferior rate of false-positive compared to other conventional techniques.展开更多
This study investigates the influences of urban land cover on the extreme rainfall event over the Zhengzhou city in central China on 20 July 2021 using the Weather Research and Forecasting model at a convection-permit...This study investigates the influences of urban land cover on the extreme rainfall event over the Zhengzhou city in central China on 20 July 2021 using the Weather Research and Forecasting model at a convection-permitting scale[1-km resolution in the innermost domain(d3)].Two ensembles of simulation(CTRL,NURB),each consisting of 11 members with a multi-layer urban canopy model and various combinations of physics schemes,were conducted using different land cover scenarios:(i)the real urban land cover,(ii)all cities in d3 being replaced with natural land cover.The results suggest that CTRL reasonably reproduces the spatiotemporal evolution of rainstorms and the 24-h rainfall accumulation over the key region,although the maximum hourly rainfall is underestimated and displaced to the west or southwest by most members.The ensemble mean 24-h rainfall accumulation over the key region of heavy rainfall is reduced by 13%,and the maximum hourly rainfall simulated by each member is reduced by 15–70 mm in CTRL relative to NURB.The reduction in the simulated rainfall by urbanization is closely associated with numerous cities/towns to the south,southeast,and east of Zhengzhou.Their heating effects jointly lead to formation of anomalous upward motions in and above the planetary boundary layer(PBL),which exaggerates the PBL drying effect due to reduced evapotranspiration and also enhances the wind stilling effect due to increased surface friction in urban areas.As a result,the lateral inflows of moisture and high-θe(equivalent potential temperature)air from south and east to Zhengzhou are reduced.展开更多
The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring f...The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring false alarms. To address the above problem, an ensemble of greedy dynamic principal component analysis-Gaussian mixture model(EGDPCA-GMM) is proposed in this paper. First, PCA-GMM is introduced to deal with the collinearity and the non-Gaussian distribution of blast furnace data.Second, in order to explain the dynamics of data, the greedy algorithm is used to determine the extended variables and their corresponding time lags, so as to avoid introducing unnecessary noise. Then the bagging ensemble is adopted to cooperate with greedy extension to eliminate the randomness brought by the greedy algorithm and further reduce the false alarm rate(FAR) of monitoring results. Finally, the algorithm is applied to the blast furnace of a large iron and steel group in South China to verify performance.Compared with the basic algorithms, the proposed method achieves lowest FAR, while keeping missed alarm rate(MAR) remain stable.展开更多
Compton camera-based prompt gamma(PG) imaging has been proposed for range verification during proton therapy. However, a deviation between the PG and dose distributions, as well as the difference between the reconstru...Compton camera-based prompt gamma(PG) imaging has been proposed for range verification during proton therapy. However, a deviation between the PG and dose distributions, as well as the difference between the reconstructed PG and exact values, limit the effectiveness of the approach in accurate range monitoring during clinical applications. The aim of the study was to realize a PG-based dose reconstruction with a Compton camera, thereby further improving the prediction accuracy of in vivo range verification and providing a novel method for beam monitoring during proton therapy. In this paper, we present an approach based on a subset-driven origin ensemble with resolution recovery and a double evolutionary algorithm to reconstruct the dose depth profile(DDP) from the gamma events obtained by a cadmium-zinc-telluride Compton camera with limited position and energy resolution. Simulations of proton pencil beams with clinical particle rate irradiating phantoms made of different materials and the CT-based thoracic phantom were used to evaluate the feasibility of the proposed method. The results show that for the monoenergetic proton pencil beam irradiating homogeneous-material box phantom,the accuracy of the reconstructed DDP was within 0.3 mm for range prediction and within 5.2% for dose prediction. In particular, for 1.6-Gy irradiation in the therapy simulation of thoracic tumors, the range deviation of the reconstructed spreadout Bragg peak was within 0.8 mm, and the relative dose deviation in the peak area was less than 7% compared to the exact values. The results demonstrate the potential and feasibility of the proposed method in future Compton-based accurate dose reconstruction and range verification during proton therapy.展开更多
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield base...The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.展开更多
基金the Deanship of Scientific Research,Najran University,Kingdom of Saudi Arabia,for funding this work under the Research Groups Funding Program Grant Code Number(NU/RG/SERC/12/43).
文摘Data security assurance is crucial due to the increasing prevalence of cloud computing and its widespread use across different industries,especially in light of the growing number of cybersecurity threats.A major and everpresent threat is Ransomware-as-a-Service(RaaS)assaults,which enable even individuals with minimal technical knowledge to conduct ransomware operations.This study provides a new approach for RaaS attack detection which uses an ensemble of deep learning models.For this purpose,the network intrusion detection dataset“UNSWNB15”from the Intelligent Security Group of the University of New South Wales,Australia is analyzed.In the initial phase,the rectified linear unit-,scaled exponential linear unit-,and exponential linear unit-based three separate Multi-Layer Perceptron(MLP)models are developed.Later,using the combined predictive power of these three MLPs,the RansoDetect Fusion ensemble model is introduced in the suggested methodology.The proposed ensemble technique outperforms previous studieswith impressive performance metrics results,including 98.79%accuracy and recall,98.85%precision,and 98.80%F1-score.The empirical results of this study validate the ensemble model’s ability to improve cybersecurity defenses by showing that it outperforms individual MLPmodels.In expanding the field of cybersecurity strategy,this research highlights the significance of combined deep learning models in strengthening intrusion detection systems against sophisticated cyber threats.
基金Project supported by the National Key Research and Development Program of China (Grant No.2021YFB3900701)the Science and Technology Plan Project of the State Administration for Market Regulation of China (Grant No.2023MK178)the National Natural Science Foundation of China (Grant No.42227802)。
文摘A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.
基金great gratitude to National Key Research and Development Project(Grant No.2019YFC1509800)for their financial supportNational Nature Science Foundation of China(Grant No.12172211)for their financial support.
文摘Geotechnical engineering data are usually small-sample and high-dimensional,which brings a lot of challenges in predictive modeling.This paper uses a typical high-dimensional and small-sample swell pressure(P s)dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction.Based on six machine learning(ML)algorithms,the base learner pool is constructed,and four ensemble methods,Stacking(SG),Blending(BG),Voting regression(VR),and Feature weight linear stacking(FWL),are used for the multi-algorithm ensemble.Furthermore,the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling.The results show that the proposed methods are superior to traditional prediction models and base ML models,where FWL is more suitable for modeling with small-sample datasets,and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect,which points the way to feature selection for predictive modeling.Based on the ensemble methods,the feature importance of the five primary factors affecting P s is the maximum dry density(31.145%),clay fraction(15.876%),swell percent(15.289%),plasticity index(14%),and optimum moisture content(13.69%),the influence of input parameters on P s is also investigated,in line with the findings of the existing literature.
基金supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant Funded by the Korean government(MSIT)(2021-0-00755,Dark Data Analysis Technology for Data Scale and Accuracy Improvement)This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R407)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care.This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques.Sequential forward feature selection,sequential backward feature elimination,and bidirectional feature elimination are investigated in this study.In ensemble learning,random forest,adaptive boosting,and bagging classifiers are employed.The effectiveness of these techniques is evaluated using two different datasets obtained from the University of California Irvine-Machine Learning Repository,both of which undergo preprocessing steps,including outlier removal,addressing missing data,data cleansing,and feature reduction.Extensive experimentation demonstrates the remarkable success of proposed ensemble stacking and bidirectional feature elimination achieving 100%and 99.86%accuracy in identifying hyperthyroidism and hypothyroidism,respectively.Beyond enhancing detection accuracy,the ensemble stacking model also demonstrated a streamlined computational complexity which is pivotal for practical medical applications.It significantly outperformed existing studies with similar objectives underscoring the viability and effectiveness of the proposed scheme.This research offers an innovative perspective and sets the platform for improved thyroid disorder diagnosis with broader implications for healthcare and patient well-being.
基金supported by the Joint Funds of the Chinese National Natural Science Foundation (NSFC)(Grant No.U2242213)the National Key Research and Development (R&D)Program of the Ministry of Science and Technology of China(Grant No. 2021YFC3000902)the National Science Foundation for Young Scholars (Grant No. 42205166)。
文摘Ensemble prediction is widely used to represent the uncertainty of single deterministic Numerical Weather Prediction(NWP) caused by errors in initial conditions(ICs). The traditional Singular Vector(SV) initial perturbation method tends only to capture synoptic scale initial uncertainty rather than mesoscale uncertainty in global ensemble prediction. To address this issue, a multiscale SV initial perturbation method based on the China Meteorological Administration Global Ensemble Prediction System(CMA-GEPS) is proposed to quantify multiscale initial uncertainty. The multiscale SV initial perturbation approach entails calculating multiscale SVs at different resolutions with multiple linearized physical processes to capture fast-growing perturbations from mesoscale to synoptic scale in target areas and combining these SVs by using a Gaussian sampling method with amplitude coefficients to generate initial perturbations. Following that, the energy norm,energy spectrum, and structure of multiscale SVs and their impact on GEPS are analyzed based on a batch experiment in different seasons. The results show that the multiscale SV initial perturbations can possess more energy and capture more mesoscale uncertainties than the traditional single-SV method. Meanwhile, multiscale SV initial perturbations can reflect the strongest dynamical instability in target areas. Their performances in global ensemble prediction when compared to single-scale SVs are shown to(i) improve the relationship between the ensemble spread and the root-mean-square error and(ii) provide a better probability forecast skill for atmospheric circulation during the late forecast period and for short-to medium-range precipitation. This study provides scientific evidence and application foundations for the design and development of a multiscale SV initial perturbation method for the GEPS.
文摘Accurate wind power forecasting is critical for system integration and stability as renewable energy reliance grows.Traditional approaches frequently struggle with complex data and non-linear connections. This article presentsa novel approach for hybrid ensemble learning that is based on rigorous requirements engineering concepts.The approach finds significant parameters influencing forecasting accuracy by evaluating real-time Modern-EraRetrospective Analysis for Research and Applications (MERRA2) data from several European Wind farms usingin-depth stakeholder research and requirements elicitation. Ensemble learning is used to develop a robust model,while a temporal convolutional network handles time-series complexities and data gaps. The ensemble-temporalneural network is enhanced by providing different input parameters including training layers, hidden and dropoutlayers along with activation and loss functions. The proposed framework is further analyzed by comparing stateof-the-art forecasting models in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE),respectively. The energy efficiency performance indicators showed that the proposed model demonstrates errorreduction percentages of approximately 16.67%, 28.57%, and 81.92% for MAE, and 38.46%, 17.65%, and 90.78%for RMSE for MERRAWind farms 1, 2, and 3, respectively, compared to other existingmethods. These quantitativeresults show the effectiveness of our proposed model with MAE values ranging from 0.0010 to 0.0156 and RMSEvalues ranging from 0.0014 to 0.0174. This work highlights the effectiveness of requirements engineering in windpower forecasting, leading to enhanced forecast accuracy and grid stability, ultimately paving the way for moresustainable energy solutions.
文摘With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.
基金supported by the National Natural Science Foundation of China Youth Fund(12105234)。
文摘The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and the corresponding ground-state spins as labels or output predictions.The quantum many-body system problem exceeds the capability of our optimized NNs in terms of accurately predicting the ground-state spin of each sample within the TBRE.However,our NN model effectively captured the statistical properties of the ground-state spin because it learned the empirical regularity of the ground-state spin distribution in TBRE,as discovered by physicists.
文摘Sentence classification is the process of categorizing a sentence based on the context of the sentence.Sentence categorization requires more semantic highlights than other tasks,such as dependence parsing,which requires more syntactic elements.Most existing strategies focus on the general semantics of a conversation without involving the context of the sentence,recognizing the progress and comparing impacts.An ensemble pre-trained language model was taken up here to classify the conversation sentences from the conversation corpus.The conversational sentences are classified into four categories:information,question,directive,and commission.These classification label sequences are for analyzing the conversation progress and predicting the pecking order of the conversation.Ensemble of Bidirectional Encoder for Representation of Transformer(BERT),Robustly Optimized BERT pretraining Approach(RoBERTa),Generative Pre-Trained Transformer(GPT),DistilBERT and Generalized Autoregressive Pretraining for Language Understanding(XLNet)models are trained on conversation corpus with hyperparameters.Hyperparameter tuning approach is carried out for better performance on sentence classification.This Ensemble of Pre-trained Language Models with a Hyperparameter Tuning(EPLM-HT)system is trained on an annotated conversation dataset.The proposed approach outperformed compared to the base BERT,GPT,DistilBERT and XLNet transformer models.The proposed ensemble model with the fine-tuned parameters achieved an F1_score of 0.88.
基金financially supported by the National Natural Science Foundation of China(Grant No.42002134)China Postdoctoral Science Foundation(Grant No.2021T140735)Science Foundation of China University of Petroleum,Beijing(Grant Nos.2462020XKJS02 and 2462020YXZZ004).
文摘Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lithofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are discussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers;(2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers;(2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification;(3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.
文摘As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.
基金supported by the National Natural Science Foundation of China under Grant 62002122Guangzhou Municipal Science and Technology Bureau under Grant 202102080492Key Scientific and Technological Research and Department of Education of Guangdong Province under Grant 2019KTSCX014.
文摘Deep neural networks have achieved tremendous success in various fields,and the structure of these networks is a key factor in their success.In this paper,we focus on the research of ensemble learning based on deep network structure and propose a new deep network ensemble framework(DNEF).Unlike other ensemble learning models,DNEF is an ensemble learning architecture of network structures,with serial iteration between the hidden layers,while base classifiers are trained in parallel within these hidden layers.Specifically,DNEF uses randomly sampled data as input and implements serial iteration based on the weighting strategy between hidden layers.In the hidden layers,each node represents a base classifier,and multiple nodes generate training data for the next hidden layer according to the transfer strategy.The DNEF operates based on two strategies:(1)The weighting strategy calculates the training instance weights of the nodes according to their weaknesses in the previous layer.(2)The transfer strategy adaptively selects each node’s instances with weights as transfer instances and transfer weights,which are combined with the training data of nodes as input for the next hidden layer.These two strategies improve the accuracy and generalization of DNEF.This research integrates the ensemble of all nodes as the final output of DNEF.The experimental results reveal that the DNEF framework surpasses the traditional ensemble models and functions with high accuracy and innovative deep ensemble methods.
基金jointly supported by the National Natural Science Foundation of China (Grant Nos. 42225501, 42105059)
文摘Based on a simple coupled Lorenz model,we investigate how to assess a suitable initial perturbation scheme for ensemble forecasting in a multiscale system involving slow dynamics and fast dynamics.Four initial perturbation approaches are used in the ensemble forecasting experiments:the random perturbation(RP),the bred vector(BV),the ensemble transform Kalman filter(ETKF),and the nonlinear local Lyapunov vector(NLLV)methods.Results show that,regardless of the method used,the ensemble averages behave indistinguishably from the control forecasts during the first few time steps.Due to different error growth in different time-scale systems,the ensemble averages perform better than the control forecast after very short lead times in a fast subsystem but after a relatively long period of time in a slow subsystem.Due to the coupled dynamic processes,the addition of perturbations to fast variables or to slow variables can contribute to an improvement in the forecasting skill for fast variables and slow variables.Regarding the initial perturbation approaches,the NLLVs show higher forecasting skill than the BVs or RPs overall.The NLLVs and ETKFs had nearly equivalent prediction skill,but NLLVs performed best by a narrow margin.In particular,when adding perturbations to slow variables,the independent perturbations(NLLVs and ETKFs)perform much better in ensemble prediction.These results are simply implied in a real coupled air–sea model.For the prediction of oceanic variables,using independent perturbations(NLLVs)and adding perturbations to oceanic variables are expected to result in better performance in the ensemble prediction.
基金Supported by the Scientific Research Foundation of Liaoning Provincial Department of Education (No.LJKZ0139)。
文摘A probabilistic multi-dimensional selective ensemble learning method and its application in the prediction of users' online purchase behavior are studied in this work.Firstly, the classifier is integrated based on the dimension of predicted probability, and the pruning algorithm based on greedy forward search is obtained by combining the two indicators of accuracy and complementarity.Then the pruning algorithm is integrated into the Stacking ensemble method to establish a user online shopping behavior prediction model based on the probabilistic multi-dimensional selective ensemble method.Finally, the research method is compared with the prediction results of individual learners in ensemble learning and the Stacking ensemble method without pruning.The experimental results show that the proposed method can reduce the scale of integration, improve the prediction accuracy of the model, and predict the user's online purchase behavior.
文摘Internet of things(IOT)possess cultural,commercial and social effect in life in the future.The nodes which are participating in IOT network are basi-cally attracted by the cyber-attack targets.Attack and identification of anomalies in IoT infrastructure is a growing problem in the IoT domain.Machine Learning Based Ensemble Intrusion Detection(MLEID)method is applied in order to resolve the drawback by minimizing malicious actions in related botnet attacks on Message Queue Telemetry Transport(MQTT)and Hyper-Text Transfer Proto-col(HTTP)protocols.The proposed work has two significant contributions which are a selection of features and detection of attacks.New features are chosen from Improved Ant Colony Optimization(IACO)in the feature selection,and then the detection of attacks is carried out based on a combination of their possible proper-ties.The IACO approach is focused on defining the attacker’s important features against HTTP and MQTT.In the IACO algorithm,the constant factor is calculated against HTTP and MQTT based on the mean function for each element.Attack detection,the performance of several machine learning models are Distance Deci-sion Tree(DDT),Adaptive Neuro-Fuzzy Inference System(ANFIS)and Mahala-nobis Distance Support Vector Machine(MDSVM)were compared with predicting accurate attacks on the IoT network.The outcomes of these classifiers are combined into the ensemble model.The proposed MLEID strategy has effec-tively established malicious incidents.The UNSW-NB15 dataset is used to test the MLEID technique using data from simulated IoT sensors.Besides,the pro-posed MLEID technique has a greater detection rate and an inferior rate of false-positive compared to other conventional techniques.
基金The National Natural Science Foundation of China(Grant Nos.42030610 and 42075083)the Innovation and Development Project of China Meteorological Administration(CXFZ2022J014)supported this study.
文摘This study investigates the influences of urban land cover on the extreme rainfall event over the Zhengzhou city in central China on 20 July 2021 using the Weather Research and Forecasting model at a convection-permitting scale[1-km resolution in the innermost domain(d3)].Two ensembles of simulation(CTRL,NURB),each consisting of 11 members with a multi-layer urban canopy model and various combinations of physics schemes,were conducted using different land cover scenarios:(i)the real urban land cover,(ii)all cities in d3 being replaced with natural land cover.The results suggest that CTRL reasonably reproduces the spatiotemporal evolution of rainstorms and the 24-h rainfall accumulation over the key region,although the maximum hourly rainfall is underestimated and displaced to the west or southwest by most members.The ensemble mean 24-h rainfall accumulation over the key region of heavy rainfall is reduced by 13%,and the maximum hourly rainfall simulated by each member is reduced by 15–70 mm in CTRL relative to NURB.The reduction in the simulated rainfall by urbanization is closely associated with numerous cities/towns to the south,southeast,and east of Zhengzhou.Their heating effects jointly lead to formation of anomalous upward motions in and above the planetary boundary layer(PBL),which exaggerates the PBL drying effect due to reduced evapotranspiration and also enhances the wind stilling effect due to increased surface friction in urban areas.As a result,the lateral inflows of moisture and high-θe(equivalent potential temperature)air from south and east to Zhengzhou are reduced.
基金supported by the National Natural Science Foundation of China (61903326, 61933015)。
文摘The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring false alarms. To address the above problem, an ensemble of greedy dynamic principal component analysis-Gaussian mixture model(EGDPCA-GMM) is proposed in this paper. First, PCA-GMM is introduced to deal with the collinearity and the non-Gaussian distribution of blast furnace data.Second, in order to explain the dynamics of data, the greedy algorithm is used to determine the extended variables and their corresponding time lags, so as to avoid introducing unnecessary noise. Then the bagging ensemble is adopted to cooperate with greedy extension to eliminate the randomness brought by the greedy algorithm and further reduce the false alarm rate(FAR) of monitoring results. Finally, the algorithm is applied to the blast furnace of a large iron and steel group in South China to verify performance.Compared with the basic algorithms, the proposed method achieves lowest FAR, while keeping missed alarm rate(MAR) remain stable.
基金supported by Natural Science Foundation of Beijing Municipality (Beijing Natural Science Foundation)(No.7191005)。
文摘Compton camera-based prompt gamma(PG) imaging has been proposed for range verification during proton therapy. However, a deviation between the PG and dose distributions, as well as the difference between the reconstructed PG and exact values, limit the effectiveness of the approach in accurate range monitoring during clinical applications. The aim of the study was to realize a PG-based dose reconstruction with a Compton camera, thereby further improving the prediction accuracy of in vivo range verification and providing a novel method for beam monitoring during proton therapy. In this paper, we present an approach based on a subset-driven origin ensemble with resolution recovery and a double evolutionary algorithm to reconstruct the dose depth profile(DDP) from the gamma events obtained by a cadmium-zinc-telluride Compton camera with limited position and energy resolution. Simulations of proton pencil beams with clinical particle rate irradiating phantoms made of different materials and the CT-based thoracic phantom were used to evaluate the feasibility of the proposed method. The results show that for the monoenergetic proton pencil beam irradiating homogeneous-material box phantom,the accuracy of the reconstructed DDP was within 0.3 mm for range prediction and within 5.2% for dose prediction. In particular, for 1.6-Gy irradiation in the therapy simulation of thoracic tumors, the range deviation of the reconstructed spreadout Bragg peak was within 0.8 mm, and the relative dose deviation in the peak area was less than 7% compared to the exact values. The results demonstrate the potential and feasibility of the proposed method in future Compton-based accurate dose reconstruction and range verification during proton therapy.
基金supported by the Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII)。
文摘The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.