Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for construct...Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).展开更多
Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-...Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.展开更多
The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model ...The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework,we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC)method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively,even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls,demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.展开更多
Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been appl...Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been applied to reservoir identification and production prediction based on reservoir identification.Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production.In this paper,a systematic ML method was developed using classification models for reservoir identification,and regression models for production prediction.The production models are based on the reservoir identification results.To realize the reservoir identification,seven optimized ML methods were used:four typical single ML methods and three ensemble ML methods.These methods classify the reservoir into five types of layers:water,dry and three levels of oil(I oil layer,II oil layer,III oil layer).The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification.The XGBoost produced the model with the highest accuracy;up to 99%.The effective thickness of I and II oil layers determined during the reservoir identification was fed into the models for predicting production.Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness.To validate the superiority of the ML methods,reference models using overall reservoir thickness were built for comparison.The models based on effective thickness outperformed the reference models in every evaluation metric.The prediction accuracy of the ML models using effective thickness were 10%higher than that of reference model.Without the personal error or data distortion existing in traditional methods,this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges.The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.展开更多
This paper uses the classical ensemble method to study the double ionization of a 2-dimensional (2D) model helium atom interacting with an elliptically polarized laser pulse. The classical ensemble calculation demon...This paper uses the classical ensemble method to study the double ionization of a 2-dimensional (2D) model helium atom interacting with an elliptically polarized laser pulse. The classical ensemble calculation demonstrates that the ratio of double to single ionization decreases with the increasing ellipticity of the driving field. The classical scenario shows that there are hardly any e--e recollisions with the circularly polarized laser pulse. The double ionization probability is studied for linearly and circularly polarized laser pulses. The classical numerical results are consistent with the semiclassical rescattering mechanism and in agreement with the experimental results and the quantum calculations qualitatively.展开更多
The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, witho...The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference “batch” score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of 10N observations (N being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using 100N observations was also performed. All tested processes on all datasets converged after N iterations, except for one process on one dataset. The best processes were averaged processes using online standardized data and a piecewise constant step-size.展开更多
An ensemble prediction model of solar proton events (SPEs), combining the information of solar flares and coronal mass ejections (CMEs), is built. In this model, solar flares are parameterized by the peak flux, th...An ensemble prediction model of solar proton events (SPEs), combining the information of solar flares and coronal mass ejections (CMEs), is built. In this model, solar flares are parameterized by the peak flux, the duration and the longitude. In addition, CMEs are parameterized by the width, the speed and the measurement position angle. The importance of each parameter for the occurrence of SPEs is estimated by the information gain ratio. We find that the CME width and speed are more informative than the flare’s peak flux and duration. As the physical mechanism of SPEs is not very clear, a hidden naive Bayes approach, which is a probability-based calculation method from the field of machine learning, is used to build the prediction model from the observational data. As is known, SPEs originate from solar flares and/or shock waves associated with CMEs. Hence, we first build two base prediction models using the properties of solar flares and CMEs, respectively. Then the outputs of these models are combined to generate the ensemble prediction model of SPEs. The ensemble prediction model incorporating the complementary information of solar flares and CMEs achieves better performance than each base prediction model taken separately.展开更多
BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To ...BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To explore the prediction accuracy of sociodemographic factors,Parkinson's motor symptoms,Parkinson’s non-motor symptoms,and rapid eye movement sleep disorder for diagnosing EOPD using PD multicenter registry data.METHODS This study analyzed 342 Parkinson patients(66 EOPD patients and 276 PD patients with normal cognition),younger than 65 years.An EOPD prediction model was developed using a random forest algorithm and the accuracy of the developed model was compared with the naive Bayesian model and discriminant analysis.RESULTS The overall accuracy of the random forest was 89.5%,and was higher than that of discriminant analysis(78.3%)and that of the naive Bayesian model(85.8%).In the random forest model,the Korean Mini Mental State Examination(K-MMSE)score,Korean Montreal Cognitive Assessment(K-MoCA),sum of boxes in Clinical Dementia Rating(CDR),global score of CDR,motor score of Untitled Parkinson’s Disease Rating(UPDRS),and Korean Instrumental Activities of Daily Living(KIADL)score were confirmed as the major variables with high weight for EOPD prediction.Among them,the K-MMSE score was the most important factor in the final model.CONCLUSION It was found that Parkinson-related motor symptoms(e.g.,motor score of UPDRS)and instrumental daily performance(e.g.,K-IADL score)in addition to cognitive screening indicators(e.g.,K-MMSE score and K-MoCA score)were predictors with high accuracy in EOPD prediction.展开更多
Recently,the combination of video services and 5G networks have been gaining attention in the wireless communication realm.With the brisk advancement in 5G network usage and the massive popularity of threedimensional ...Recently,the combination of video services and 5G networks have been gaining attention in the wireless communication realm.With the brisk advancement in 5G network usage and the massive popularity of threedimensional video streaming,the quality of experience(QoE)of video in 5G systems has been receiving overwhelming significance from both customers and service provider ends.Therefore,effectively categorizing QoE-aware video streaming is imperative for achieving greater client satisfaction.This work makes the following contribution:First,a simulation platform based on NS-3 is introduced to analyze and improve the performance of video services.The simulation is formulated to offer real-time measurements,saving the expensive expenses associated with real-world equipment.Second,A valuable framework for QoE-aware video streaming categorization is introduced in 5G networks based on machine learning(ML)by incorporating the hyperparameter tuning(HPT)principle.It implements an enhanced hyperparameter tuning(EHPT)ensemble and decision tree(DT)classifier for video streaming categorization.The performance of the ML approach is assessed by considering precision,accuracy,recall,and computation time metrics for manifesting the superiority of these classifiers regarding video streaming categorization.This paper demonstrates that our ML classifiers achieve QoE prediction accuracy of 92.59%for(EHPT)ensemble and 87.037%for decision tree(DT)classifiers.展开更多
Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support governm...Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.展开更多
Industrial Internet of Things(IIoT)systems depend on a growing number of edge devices such as sensors,controllers,and robots for data collection,transmission,storage,and processing.Any kind of malicious or abnormal fu...Industrial Internet of Things(IIoT)systems depend on a growing number of edge devices such as sensors,controllers,and robots for data collection,transmission,storage,and processing.Any kind of malicious or abnormal function by each of these devices can jeopardize the security of the entire IIoT.Moreover,they can allow malicious software installed on end nodes to penetrate the network.This paper presents a parallel ensemble model for threat hunting based on anomalies in the behavior of IIoT edge devices.The proposed model is flexible enough to use several state-of-the-art classifiers as the basic learner and efficiently classifies multi-class anomalies using the Multi-class AdaBoost and majority voting.Experimental evaluations using a dataset consisting of multi-source normal records and multi-class anomalies demonstrate that our model outperforms existing approaches in terms of accuracy,F1 score,recall,and precision.展开更多
In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:...In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:i)An optimized LGBM model has been developed for the identification of malicious IoT activities in the IoT network;ii)An efficient evolutionary optimization approach has been adopted for finding the optimal set of hyper-parameters of LGBM for the projected problem.Here,a Genetic Algorithm(GA)with k-way tournament selection and uniform crossover operation is used for efficient exploration of hyper-parameter search space;iii)Finally,the performance of the proposed model is evaluated using state-of-the-art ensemble learning and machine learning-based model to achieve overall generalized performance and efficiency.Simulation outcomes reveal that the proposed approach is superior to other considered methods and proves to be a robust approach to intrusion detection in an IoT environment.展开更多
The response of the train–bridge system has an obvious random behavior.A high traffic density and a long maintenance period of a track will result in a substantial increase in the number of trains running on a bridge...The response of the train–bridge system has an obvious random behavior.A high traffic density and a long maintenance period of a track will result in a substantial increase in the number of trains running on a bridge,and there is small likelihood that the maximum responses of the train and bridge happen in the total maintenance period of the track.Firstly,the coupling model of train–bridge systems is reviewed.Then,an ensemble method is presented,which can estimate the small probabilities of a dynamic system with stochastic excitations.The main idea of the ensemble method is to use the NARX(nonlinear autoregressive with exogenous input)model to replace the physical model and apply subset simulation with splitting to obtain the extreme distribution.Finally,the efficiency of the suggested method is compared with the direct Monte Carlo simulation method,and the probability exceedance of train responses under the vertical track irregularity is discussed.The results show that when the small probability of train responses under vertical track irregularity is estimated,the ensemble method can reduce both the calculation time of a single sample and the required number of samples.展开更多
Extreme learning machine(ELM)has been proved to be an effective pattern classification and regression learning mechanism by researchers.However,its good performance is based on a large number of hidden layer nodes.Wit...Extreme learning machine(ELM)has been proved to be an effective pattern classification and regression learning mechanism by researchers.However,its good performance is based on a large number of hidden layer nodes.With the increase of the nodes in the hidden layers,the computation cost is greatly increased.In this paper,we propose a novel algorithm,named constrained voting extreme learning machine(CV-ELM).Compared with the traditional ELM,the CV-ELM determines the input weight and bias based on the differences of between-class samples.At the same time,to improve the accuracy of the proposed method,the voting selection is introduced.The proposed method is evaluated on public benchmark datasets.The experimental results show that the proposed algorithm is superior to the original ELM algorithm.Further,we apply the CV-ELM to the classification of superheat degree(SD)state in the aluminum electrolysis industry,and the recognition accuracy rate reaches87.4%,and the experimental results demonstrate that the proposed method is more robust than the existing state-of-the-art identification methods.展开更多
Anomaly detection in smart homes provides support to enhance the health and safety of people who live alone.Compared to the previous studies done on this topic,less attention has been given to hybrid methods.This pape...Anomaly detection in smart homes provides support to enhance the health and safety of people who live alone.Compared to the previous studies done on this topic,less attention has been given to hybrid methods.This paper presents a two-steps hybrid probabilistic anomaly detection model in the smart home.First,it employs various algorithms with different characteristics to detect anomalies from sensory data.Then,it aggregates their results using a Bayesian network.In this Bayesian network,abnormal events are detected through calculating the probability of abnormality given anomaly detection results of base methods.Experimental evaluation of a real dataset indicates the effectiveness of the proposed method by reducing false positives and increasing true positives.展开更多
A new second order time stepping ensemble hybridizable discontinuous Galerkin method for parameterized convection diffusion PDEs with various initial and boundary conditions,body forces,and time depending coefficients...A new second order time stepping ensemble hybridizable discontinuous Galerkin method for parameterized convection diffusion PDEs with various initial and boundary conditions,body forces,and time depending coefficients is developed.For ensemble solutions in L_(∞)(0,T;L^(2)(Ω)),a superconvergent rate with respect to the freedom degree of the globally coupled unknowns for all the polynomials of degree k≥0 is established.The results of numerical experiments are consistent with the theoretical findings.展开更多
The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast...The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast (LAF)(0000, 0600, 1200 and 1800 GMT in two consecutive days) of the 500 hPa height field with the global spectral model (T63L16) from January to May 1997 are provided by the National Climate Center of China. The relationship between the spread of ensemble measured by root–mean–square deviation of ensemble member from ensemble mean and forecast skill (the anomaly correlation or the root–mean–square distance between the ensemble mean forecast and the observation) is significant. The spread of ensemble can evaluate the useful forecast days N for the best estimate of 30 days mean. Thus, a weighted mean approach based on ensemble spread is put forward for monthly dynamical prediction. The anomaly correlation of the weighted monthly mean by the ensemble spread is higher than that of both the arithmetic mean and the linear weighted mean. Better results of the monthly mean circulation and anomaly are obtained from the ensemble spread weighted mean. Key words Monthly prediction - Ensemble method - Spread of ensemble Supported by the Excellent National State Key Laboratory Project (49823002), the National Key Project ‘Study on Chinese Short-Term Climate Forecast System’ (96-908-02) and IAP Innovation Foundation (8-1308).The data were provided through the National Climate Center of China. The authors wish to thank Ms. Chen Lijuan for her assistance.展开更多
Time lapse, characteristic of aging, is a complex process that affects the reliability and security of biometric face recognition systems. This paper reports the novel use and effectiveness of deep learning, in genera...Time lapse, characteristic of aging, is a complex process that affects the reliability and security of biometric face recognition systems. This paper reports the novel use and effectiveness of deep learning, in general, and convolutional neural networks (CNN), in particular, for automatic rather than hand-crafted feature extraction for robust face recognition across time lapse. A CNN architecture using the VGG-Face deep (neural network) learning is found to produce highly discriminative and interoperable features that are robust to aging variations even across a mix of biometric datasets. The features extracted show high inter-class and low intra-class variability leading to low generalization errors on aging datasets using ensembles of subspace discriminant classifiers. The classification results for the all-encompassing authentication methods proposed on the challenging FG-NET and MORPH datasets are competitive with state-of-the-art methods including commercial face recognition engines and are richer in functionality and interoperability than existing methods as it handles mixed biometric datasets, e.g., FG-NET and MORPH.展开更多
In recent years,the COVID-19 pandemic has negatively impacted all aspects of social life.Due to ease in the infected method,i.e.,through small liquid particles from the mouth or the nose when people cough,sneeze,speak...In recent years,the COVID-19 pandemic has negatively impacted all aspects of social life.Due to ease in the infected method,i.e.,through small liquid particles from the mouth or the nose when people cough,sneeze,speak,sing,or breathe,the virus can quickly spread and create severe problems for people’s health.According to some research as well as World Health Organization(WHO)recommendation,one of the most economical and effective methods to prevent the spread of the pandemic is to ask people to wear the face mask in the public space.A face mask will help prevent the droplet and aerosol from person to person to reduce the risk of virus infection.This simple method can reduce up to 95%of the spread of the particles.However,this solution depends heavily on social consciousness,which is sometimes unstable.In order to improve the effectiveness of wearing face masks in public spaces,this research proposes an approach for detecting and warning a person who does not wear or misuse the face mask.The approach uses the deep learning technique that relies on GoogleNet,AlexNet,and VGG16 models.The results are synthesized by an ensemble method,i.e.,the bagging technique.From the experimental results,the approach represents a more than 95%accuracy of face mask recognition.展开更多
Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We...Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We developed an ensemble machine learning system named performance-weighted-voting model for cancer type classification in 6,249 samples across 14 cancer types.Our ensemble system consists of five weak classifiers(logistic regression,SVM,random forest,XGBoost and neural networks).We first used cross-validation to get the predicted results for the five classifiers.The weights of the five weak classifiers can be obtained based on their predictive performance by solving linear regression functions.The final predicted probability of the performance-weighted-voting model for a cancer type can be determined by the summation of each classifier's weight multiplied by its predicted probability.Results:Using the somatic mutation count of each gene as the input feature,the overall accuracy of the performance-weighted-voting model reached 71.46%,which was significantly higher than the five weak classifiers and two other ensemble models:the hard-voting model and the soft-voting model.In addition,by analyzing the predictive pattern of the performance-weighted-voting model,we found that in most cancer types,higher tumor mutational burden can improve overall accuracy.Conclusion:This study has important clinical significance for identifying the origin of cancer,especially for those where the primary cannot be determined.In addition,our model presents a good strategy for using ensemble systems for cancer type classification.展开更多
文摘Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).
基金This study was supported by the National Key Research and Development Program of China(No.2017YFB0304100)Key Projects of the National Natural Science Foundation of China(No.51634002).
文摘Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.
基金partially sponsored by the Natural Science Foundation of Shanghai(No.23ZR1429300)the Innovation Fund of CNNC(Lingchuang Fund)。
文摘The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework,we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC)method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively,even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls,demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.
文摘Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been applied to reservoir identification and production prediction based on reservoir identification.Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production.In this paper,a systematic ML method was developed using classification models for reservoir identification,and regression models for production prediction.The production models are based on the reservoir identification results.To realize the reservoir identification,seven optimized ML methods were used:four typical single ML methods and three ensemble ML methods.These methods classify the reservoir into five types of layers:water,dry and three levels of oil(I oil layer,II oil layer,III oil layer).The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification.The XGBoost produced the model with the highest accuracy;up to 99%.The effective thickness of I and II oil layers determined during the reservoir identification was fed into the models for predicting production.Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness.To validate the superiority of the ML methods,reference models using overall reservoir thickness were built for comparison.The models based on effective thickness outperformed the reference models in every evaluation metric.The prediction accuracy of the ML models using effective thickness were 10%higher than that of reference model.Without the personal error or data distortion existing in traditional methods,this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges.The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.
基金supported by the National Natural Science Foundation of China (Grant Nos. 10974068 and 10574057)
文摘This paper uses the classical ensemble method to study the double ionization of a 2-dimensional (2D) model helium atom interacting with an elliptically polarized laser pulse. The classical ensemble calculation demonstrates that the ratio of double to single ionization decreases with the increasing ellipticity of the driving field. The classical scenario shows that there are hardly any e--e recollisions with the circularly polarized laser pulse. The double ionization probability is studied for linearly and circularly polarized laser pulses. The classical numerical results are consistent with the semiclassical rescattering mechanism and in agreement with the experimental results and the quantum calculations qualitatively.
文摘The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference “batch” score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of 10N observations (N being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using 100N observations was also performed. All tested processes on all datasets converged after N iterations, except for one process on one dataset. The best processes were averaged processes using online standardized data and a piecewise constant step-size.
基金supported by the Young Researcher Grant of National Astronomical Observatories, Chinese Academy of Sciences, the National Basic Research Program of China (973 Program, Grant No. 2011CB811406)the National Natural Science Foundation of China (Grant Nos. 10733020, 10921303, 11003026 and 11078010)
文摘An ensemble prediction model of solar proton events (SPEs), combining the information of solar flares and coronal mass ejections (CMEs), is built. In this model, solar flares are parameterized by the peak flux, the duration and the longitude. In addition, CMEs are parameterized by the width, the speed and the measurement position angle. The importance of each parameter for the occurrence of SPEs is estimated by the information gain ratio. We find that the CME width and speed are more informative than the flare’s peak flux and duration. As the physical mechanism of SPEs is not very clear, a hidden naive Bayes approach, which is a probability-based calculation method from the field of machine learning, is used to build the prediction model from the observational data. As is known, SPEs originate from solar flares and/or shock waves associated with CMEs. Hence, we first build two base prediction models using the properties of solar flares and CMEs, respectively. Then the outputs of these models are combined to generate the ensemble prediction model of SPEs. The ensemble prediction model incorporating the complementary information of solar flares and CMEs achieves better performance than each base prediction model taken separately.
基金Supported by Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education,No.NRF-2018R1D1A1B07041091 and NRF-2019S1A5A8034211.
文摘BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To explore the prediction accuracy of sociodemographic factors,Parkinson's motor symptoms,Parkinson’s non-motor symptoms,and rapid eye movement sleep disorder for diagnosing EOPD using PD multicenter registry data.METHODS This study analyzed 342 Parkinson patients(66 EOPD patients and 276 PD patients with normal cognition),younger than 65 years.An EOPD prediction model was developed using a random forest algorithm and the accuracy of the developed model was compared with the naive Bayesian model and discriminant analysis.RESULTS The overall accuracy of the random forest was 89.5%,and was higher than that of discriminant analysis(78.3%)and that of the naive Bayesian model(85.8%).In the random forest model,the Korean Mini Mental State Examination(K-MMSE)score,Korean Montreal Cognitive Assessment(K-MoCA),sum of boxes in Clinical Dementia Rating(CDR),global score of CDR,motor score of Untitled Parkinson’s Disease Rating(UPDRS),and Korean Instrumental Activities of Daily Living(KIADL)score were confirmed as the major variables with high weight for EOPD prediction.Among them,the K-MMSE score was the most important factor in the final model.CONCLUSION It was found that Parkinson-related motor symptoms(e.g.,motor score of UPDRS)and instrumental daily performance(e.g.,K-IADL score)in addition to cognitive screening indicators(e.g.,K-MMSE score and K-MoCA score)were predictors with high accuracy in EOPD prediction.
文摘Recently,the combination of video services and 5G networks have been gaining attention in the wireless communication realm.With the brisk advancement in 5G network usage and the massive popularity of threedimensional video streaming,the quality of experience(QoE)of video in 5G systems has been receiving overwhelming significance from both customers and service provider ends.Therefore,effectively categorizing QoE-aware video streaming is imperative for achieving greater client satisfaction.This work makes the following contribution:First,a simulation platform based on NS-3 is introduced to analyze and improve the performance of video services.The simulation is formulated to offer real-time measurements,saving the expensive expenses associated with real-world equipment.Second,A valuable framework for QoE-aware video streaming categorization is introduced in 5G networks based on machine learning(ML)by incorporating the hyperparameter tuning(HPT)principle.It implements an enhanced hyperparameter tuning(EHPT)ensemble and decision tree(DT)classifier for video streaming categorization.The performance of the ML approach is assessed by considering precision,accuracy,recall,and computation time metrics for manifesting the superiority of these classifiers regarding video streaming categorization.This paper demonstrates that our ML classifiers achieve QoE prediction accuracy of 92.59%for(EHPT)ensemble and 87.037%for decision tree(DT)classifiers.
基金supported by the Taishan Scholars (No.ts201712003)。
文摘Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.
文摘Industrial Internet of Things(IIoT)systems depend on a growing number of edge devices such as sensors,controllers,and robots for data collection,transmission,storage,and processing.Any kind of malicious or abnormal function by each of these devices can jeopardize the security of the entire IIoT.Moreover,they can allow malicious software installed on end nodes to penetrate the network.This paper presents a parallel ensemble model for threat hunting based on anomalies in the behavior of IIoT edge devices.The proposed model is flexible enough to use several state-of-the-art classifiers as the basic learner and efficiently classifies multi-class anomalies using the Multi-class AdaBoost and majority voting.Experimental evaluations using a dataset consisting of multi-source normal records and multi-class anomalies demonstrate that our model outperforms existing approaches in terms of accuracy,F1 score,recall,and precision.
文摘In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:i)An optimized LGBM model has been developed for the identification of malicious IoT activities in the IoT network;ii)An efficient evolutionary optimization approach has been adopted for finding the optimal set of hyper-parameters of LGBM for the projected problem.Here,a Genetic Algorithm(GA)with k-way tournament selection and uniform crossover operation is used for efficient exploration of hyper-parameter search space;iii)Finally,the performance of the proposed model is evaluated using state-of-the-art ensemble learning and machine learning-based model to achieve overall generalized performance and efficiency.Simulation outcomes reveal that the proposed approach is superior to other considered methods and proves to be a robust approach to intrusion detection in an IoT environment.
基金This work was financially supported by the National Natural Science Foundation of China(Nos.51978589,51778544,and 51525804).
文摘The response of the train–bridge system has an obvious random behavior.A high traffic density and a long maintenance period of a track will result in a substantial increase in the number of trains running on a bridge,and there is small likelihood that the maximum responses of the train and bridge happen in the total maintenance period of the track.Firstly,the coupling model of train–bridge systems is reviewed.Then,an ensemble method is presented,which can estimate the small probabilities of a dynamic system with stochastic excitations.The main idea of the ensemble method is to use the NARX(nonlinear autoregressive with exogenous input)model to replace the physical model and apply subset simulation with splitting to obtain the extreme distribution.Finally,the efficiency of the suggested method is compared with the direct Monte Carlo simulation method,and the probability exceedance of train responses under the vertical track irregularity is discussed.The results show that when the small probability of train responses under vertical track irregularity is estimated,the ensemble method can reduce both the calculation time of a single sample and the required number of samples.
基金supported by the National Natural Science Foundation of China(6177340561751312)the Major Scientific and Technological Innovation Projects of Shandong Province(2019JZZY020123)。
文摘Extreme learning machine(ELM)has been proved to be an effective pattern classification and regression learning mechanism by researchers.However,its good performance is based on a large number of hidden layer nodes.With the increase of the nodes in the hidden layers,the computation cost is greatly increased.In this paper,we propose a novel algorithm,named constrained voting extreme learning machine(CV-ELM).Compared with the traditional ELM,the CV-ELM determines the input weight and bias based on the differences of between-class samples.At the same time,to improve the accuracy of the proposed method,the voting selection is introduced.The proposed method is evaluated on public benchmark datasets.The experimental results show that the proposed algorithm is superior to the original ELM algorithm.Further,we apply the CV-ELM to the classification of superheat degree(SD)state in the aluminum electrolysis industry,and the recognition accuracy rate reaches87.4%,and the experimental results demonstrate that the proposed method is more robust than the existing state-of-the-art identification methods.
文摘Anomaly detection in smart homes provides support to enhance the health and safety of people who live alone.Compared to the previous studies done on this topic,less attention has been given to hybrid methods.This paper presents a two-steps hybrid probabilistic anomaly detection model in the smart home.First,it employs various algorithms with different characteristics to detect anomalies from sensory data.Then,it aggregates their results using a Bayesian network.In this Bayesian network,abnormal events are detected through calculating the probability of abnormality given anomaly detection results of base methods.Experimental evaluation of a real dataset indicates the effectiveness of the proposed method by reducing false positives and increasing true positives.
基金G.Chen was supported by National Natural Science Foundation of China(NSFC)(11801063)by China Postdoctoral Science Foundation(2018M633339,2019T120808)+1 种基金by the Fundamental Research Funds for the Central Universities(YJ202030)Y.Zhang was supported by US National Science Foundation(NSF)(DMS-1619904).
文摘A new second order time stepping ensemble hybridizable discontinuous Galerkin method for parameterized convection diffusion PDEs with various initial and boundary conditions,body forces,and time depending coefficients is developed.For ensemble solutions in L_(∞)(0,T;L^(2)(Ω)),a superconvergent rate with respect to the freedom degree of the globally coupled unknowns for all the polynomials of degree k≥0 is established.The results of numerical experiments are consistent with the theoretical findings.
基金Supported by the Excellent National State Key Laboratory Project! (49823002)the National Key Project 'Study on Chinese Short
文摘The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast (LAF)(0000, 0600, 1200 and 1800 GMT in two consecutive days) of the 500 hPa height field with the global spectral model (T63L16) from January to May 1997 are provided by the National Climate Center of China. The relationship between the spread of ensemble measured by root–mean–square deviation of ensemble member from ensemble mean and forecast skill (the anomaly correlation or the root–mean–square distance between the ensemble mean forecast and the observation) is significant. The spread of ensemble can evaluate the useful forecast days N for the best estimate of 30 days mean. Thus, a weighted mean approach based on ensemble spread is put forward for monthly dynamical prediction. The anomaly correlation of the weighted monthly mean by the ensemble spread is higher than that of both the arithmetic mean and the linear weighted mean. Better results of the monthly mean circulation and anomaly are obtained from the ensemble spread weighted mean. Key words Monthly prediction - Ensemble method - Spread of ensemble Supported by the Excellent National State Key Laboratory Project (49823002), the National Key Project ‘Study on Chinese Short-Term Climate Forecast System’ (96-908-02) and IAP Innovation Foundation (8-1308).The data were provided through the National Climate Center of China. The authors wish to thank Ms. Chen Lijuan for her assistance.
文摘Time lapse, characteristic of aging, is a complex process that affects the reliability and security of biometric face recognition systems. This paper reports the novel use and effectiveness of deep learning, in general, and convolutional neural networks (CNN), in particular, for automatic rather than hand-crafted feature extraction for robust face recognition across time lapse. A CNN architecture using the VGG-Face deep (neural network) learning is found to produce highly discriminative and interoperable features that are robust to aging variations even across a mix of biometric datasets. The features extracted show high inter-class and low intra-class variability leading to low generalization errors on aging datasets using ensembles of subspace discriminant classifiers. The classification results for the all-encompassing authentication methods proposed on the challenging FG-NET and MORPH datasets are competitive with state-of-the-art methods including commercial face recognition engines and are richer in functionality and interoperability than existing methods as it handles mixed biometric datasets, e.g., FG-NET and MORPH.
文摘In recent years,the COVID-19 pandemic has negatively impacted all aspects of social life.Due to ease in the infected method,i.e.,through small liquid particles from the mouth or the nose when people cough,sneeze,speak,sing,or breathe,the virus can quickly spread and create severe problems for people’s health.According to some research as well as World Health Organization(WHO)recommendation,one of the most economical and effective methods to prevent the spread of the pandemic is to ask people to wear the face mask in the public space.A face mask will help prevent the droplet and aerosol from person to person to reduce the risk of virus infection.This simple method can reduce up to 95%of the spread of the particles.However,this solution depends heavily on social consciousness,which is sometimes unstable.In order to improve the effectiveness of wearing face masks in public spaces,this research proposes an approach for detecting and warning a person who does not wear or misuse the face mask.The approach uses the deep learning technique that relies on GoogleNet,AlexNet,and VGG16 models.The results are synthesized by an ensemble method,i.e.,the bagging technique.From the experimental results,the approach represents a more than 95%accuracy of face mask recognition.
文摘Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We developed an ensemble machine learning system named performance-weighted-voting model for cancer type classification in 6,249 samples across 14 cancer types.Our ensemble system consists of five weak classifiers(logistic regression,SVM,random forest,XGBoost and neural networks).We first used cross-validation to get the predicted results for the five classifiers.The weights of the five weak classifiers can be obtained based on their predictive performance by solving linear regression functions.The final predicted probability of the performance-weighted-voting model for a cancer type can be determined by the summation of each classifier's weight multiplied by its predicted probability.Results:Using the somatic mutation count of each gene as the input feature,the overall accuracy of the performance-weighted-voting model reached 71.46%,which was significantly higher than the five weak classifiers and two other ensemble models:the hard-voting model and the soft-voting model.In addition,by analyzing the predictive pattern of the performance-weighted-voting model,we found that in most cancer types,higher tumor mutational burden can improve overall accuracy.Conclusion:This study has important clinical significance for identifying the origin of cancer,especially for those where the primary cannot be determined.In addition,our model presents a good strategy for using ensemble systems for cancer type classification.