Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,re...Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,reduce costs,and ensure product quality.In light of the recent advancement of Industry 4.0,identifying defects has become important for ensuring the quality of products during the manufacturing process.In this research,we present an ensemble methodology for accurately classifying hot rolled steel surface defects by combining the strengths of four pre-trained convolutional neural network(CNN)architectures:VGG16,VGG19,Xception,and Mobile-Net V2,compensating for their individual weaknesses.We evaluated our methodology on the Xsteel surface defect dataset(XSDD),which comprises seven different classes.The ensemble methodology integrated the predictions of individual models through two methods:model averaging and weighted averaging.Our evaluation showed that the model averaging ensemble achieved an accuracy of 98.89%,a recall of 98.92%,a precision of 99.05%,and an F1-score of 98.97%,while the weighted averaging ensemble reached an accuracy of 99.72%,a recall of 99.74%,a precision of 99.67%,and an F1-score of 99.70%.The proposed weighted averaging ensemble model outperformed the model averaging method and the individual models in detecting defects in terms of accuracy,recall,precision,and F1-score.Comparative analysis with recent studies also showed the superior performance of our methodology.展开更多
This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble lear...This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making.展开更多
Existing web-based security applications have failed in many situations due to the great intelligence of attackers.Among web applications,Cross-Site Scripting(XSS)is one of the dangerous assaults experienced while mod...Existing web-based security applications have failed in many situations due to the great intelligence of attackers.Among web applications,Cross-Site Scripting(XSS)is one of the dangerous assaults experienced while modifying an organization's or user's information.To avoid these security challenges,this article proposes a novel,all-encompassing combination of machine learning(NB,SVM,k-NN)and deep learning(RNN,CNN,LSTM)frameworks for detecting and defending against XSS attacks with high accuracy and efficiency.Based on the representation,a novel idea for merging stacking ensemble with web applications,termed“hybrid stacking”,is proposed.In order to implement the aforementioned methods,four distinct datasets,each of which contains both safe and unsafe content,are considered.The hybrid detection method can adaptively identify the attacks from the URL,and the defense mechanism inherits the advantages of URL encoding with dictionary-based mapping to improve prediction accuracy,accelerate the training process,and effectively remove the unsafe JScript/JavaScript keywords from the URL.The simulation results show that the proposed hybrid model is more efficient than the existing detection methods.It produces more than 99.5%accurate XSS attack classification results(accuracy,precision,recall,f1_score,and Receiver Operating Characteristic(ROC))and is highly resistant to XSS attacks.In order to ensure the security of the server's information,the proposed hybrid approach is demonstrated in a real-time environment.展开更多
With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning ...With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.展开更多
In order to improve the performance of the attribute reduction algorithm to deal with the noisy and uncertain large data, a novel co-evolutionary cloud-based attribute ensemble multi-agent reduction(CCAEMR) algorith...In order to improve the performance of the attribute reduction algorithm to deal with the noisy and uncertain large data, a novel co-evolutionary cloud-based attribute ensemble multi-agent reduction(CCAEMR) algorithm is proposed.First, a co-evolutionary cloud framework is designed under the M apReduce mechanism to divide the entire population into different co-evolutionary subpopulations with a self-adaptive scale. Meanwhile, these subpopulations will share their rewards to accelerate attribute reduction implementation.Secondly, a multi-agent ensemble strategy of co-evolutionary elitist optimization is constructed to ensure that subpopulations can exploit any correlation and interdependency between interacting attribute subsets with reinforcing noise tolerance.Hence, these agents are kept within the stable elitist region to achieve the optimal profit. The experimental results show that the proposed CCAEMR algorithm has better efficiency and feasibility to solve large-scale and uncertain dataset problems with complex noise.展开更多
Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods.While experimental approaches are highly accurate,they often demand extensive time and ...Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods.While experimental approaches are highly accurate,they often demand extensive time and resources.To address these challenges,we present a computational ensemble learning framework designed to identify essential proteins more efficiently.Our method begins by using node2vec to transform proteins in the protein–protein interaction(PPI)network into continuous,low-dimensional vectors.We also extract a range of features from protein sequences,including graph-theory-based,information-based,compositional,and physiochemical attributes.Additionally,we leverage deep learning techniques to analyze high-dimensional position-specific scoring matrices(PSSMs)and capture evolutionary information.We then combine these features for classification using various machine learning algorithms.To enhance performance,we integrate the outputs of these algorithms through ensemble methods such as voting,weighted averaging,and stacking.This approach effectively addresses data imbalances and improves both robustness and accuracy.Our ensemble learning framework achieves an AUC of 0.960 and an accuracy of 0.9252,outperforming other computational methods.These results demonstrate the effectiveness of our approach in accurately identifying essential proteins and highlight its superior feature extraction capabilities.展开更多
This study explored the impact of coastal radar observability on the forecast of the track and rainfall of Typhoon Morakot (2009) using a WRF-based ensemble Kalman filter (EnKF) data assimilation (DA) system. Th...This study explored the impact of coastal radar observability on the forecast of the track and rainfall of Typhoon Morakot (2009) using a WRF-based ensemble Kalman filter (EnKF) data assimilation (DA) system. The results showed that the performance of radar EnKF DA was quite sensitive to the number of radars being assimilated and the DA timing relative to the landfall of the tropical cyclone (TC). It was found that assimilating radial velocity (Vr) data from all the four operational radars during the 6 h immediately before TC landfall was quite important for the track and rainfall forecasts after the TC made landfall. The TC track forecast error could be decreased by about 43% and the 24-h rainfall forecast skill could be almost tripled. Assimilating Vr data from a single radar outperformed the experiment without DA, though with less improvement compared to the multiple-radar DA experiment. Different forecast performances were obtained by assimilating different radars, which was closely related to the first-time wind analysis increment, the location of moisture transport, the quasi-stationary rainband, and the local convergence line. However, only assimilating Vr data when the TC was farther away from making landfall might worsen TC track and rainfall forecasts. Besides, this work also demonstrated that Vr data from multiple radars, instead of a single radar, should be used for verification to obtain a more reliable assessment of the EnKF performance.展开更多
Rockburst prediction is of vital significance to the design and construction of underground hard rock mines.A rockburst database consisting of 102 case histories,i.e.,1998−2011 period data from 14 hard rock mines was ...Rockburst prediction is of vital significance to the design and construction of underground hard rock mines.A rockburst database consisting of 102 case histories,i.e.,1998−2011 period data from 14 hard rock mines was examined for rockburst prediction in burst-prone mines by three tree-based ensemble methods.The dataset was examined with six widely accepted indices which are:the maximum tangential stress around the excavation boundary(MTS),uniaxial compressive strength(UCS)and uniaxial tensile strength(UTS)of the intact rock,stress concentration factor(SCF),rock brittleness index(BI),and strain energy storage index(EEI).Two boosting(AdaBoost.M1,SAMME)and bagging algorithms with classification trees as baseline classifier on ability to learn rockburst were evaluated.The available dataset was randomly divided into training set(2/3 of whole datasets)and testing set(the remaining datasets).Repeated 10-fold cross validation(CV)was applied as the validation method for tuning the hyper-parameters.The margin analysis and the variable relative importance were employed to analyze some characteristics of the ensembles.According to 10-fold CV,the accuracy analysis of rockburst dataset demonstrated that the best prediction method for the potential of rockburst is bagging when compared to AdaBoost.M1,SAMME algorithms and empirical criteria methods.展开更多
On 21 July 2012,an extreme rainfall event that recorded a maximum rainfall amount over 24 hours of 460 mm,occurred in Beijing,China. Most operational models failed to predict such an extreme amount. In this study,a co...On 21 July 2012,an extreme rainfall event that recorded a maximum rainfall amount over 24 hours of 460 mm,occurred in Beijing,China. Most operational models failed to predict such an extreme amount. In this study,a convective-permitting ensemble forecast system(CEFS),at 4-km grid spacing,covering the entire mainland of China,is applied to this extreme rainfall case. CEFS consists of 22 members and uses multiple physics parameterizations. For the event,the predicted maximum is 415 mm d^-1 in the probability-matched ensemble mean. The predicted high-probability heavy rain region is located in southwest Beijing,as was observed. Ensemble-based verification scores are then investigated. For a small verification domain covering Beijing and its surrounding areas,the precipitation rank histogram of CEFS is much flatter than that of a reference global ensemble. CEFS has a lower(higher) Brier score and a higher resolution than the global ensemble for precipitation,indicating more reliable probabilistic forecasting by CEFS. Additionally,forecasts of different ensemble members are compared and discussed. Most of the extreme rainfall comes from convection in the warm sector east of an approaching cold front. A few members of CEFS successfully reproduce such precipitation,and orographic lift of highly moist low-level flows with a significantly southeasterly component is suggested to have played important roles in producing the initial convection. Comparisons between good and bad forecast members indicate a strong sensitivity of the extreme rainfall to the mesoscale environmental conditions,and,to less of an extent,the model physics.展开更多
This study examines the effectiveness of ensemble Kalman filters in data assimilation with the strongly nonlinear dynamics of the Lorenz-63 model, and in particular their use in predicting the regime transition that o...This study examines the effectiveness of ensemble Kalman filters in data assimilation with the strongly nonlinear dynamics of the Lorenz-63 model, and in particular their use in predicting the regime transition that occurs when the model jumps from one basin of attraction to the other. Four configurations of the ensemble-based Kalman filtering data assimilation techniques, including the ensemble Kalman filter, en- semble adjustment Kalman filter, ensemble square root filter and ensemble transform Kalman filter, are evaluated with their ability in predicting the regime transition (also called phase transition) and also are compared in terms of their sensitivity to both observational and sampling errors. The sensitivity of each ensemble-based filter to the size of the ensemble is also examined.展开更多
An effective ensemble should consist of a set of networks that are both accurate and diverse. We propose a novel clustering-based selective algorithm for constructing neural network ensemble, where clustering technolo...An effective ensemble should consist of a set of networks that are both accurate and diverse. We propose a novel clustering-based selective algorithm for constructing neural network ensemble, where clustering technology is used to classify trained networks according to similarity and optimally select the most accurate individual network from each cluster to make up the ensemble. Empirical studies on regression of four typical datasets showed that this approach yields significantly smaller en- semble achieving better performance than other traditional ones such as Bagging and Boosting. The bias variance decomposition of the predictive error shows that the success of the proposed approach may lie in its properly tuning the bias/variance trade-off to reduce the prediction error (the sum of bias2 and variance).展开更多
Shale gas reservoirs have been successfully developed due to the advancement of the horizontal well drilling and multistage hydraulic fracturing techniques.However,the optimization design of the horizontal well drilli...Shale gas reservoirs have been successfully developed due to the advancement of the horizontal well drilling and multistage hydraulic fracturing techniques.However,the optimization design of the horizontal well drilling,hydraulic fracturing,and operational schedule is a challenging problem.An ensemble-based optimization method(EnOpt)is proposed here to optimize the design of the hydraulically fractured horizontal well in the shale gas reservoir.The objective is to maximize the net present value(NPV)which requires a simulation model to predict the cumulative shale gas production.To accurately describe the geometry of the hydraulic fractures,the embedded discrete fracture modeling method(EDFM)is used to construct the shale gas simulation model.The efects of gas absorption,Knudsen difusion,natural and hydraulic fractures,and gas-water two phase fow are considered in the shale gas production system.To improve the parameter continuity and Gaussianity required by the EnOpt method,the Hough transformation parameterization is used to characterize the horizontal well.The results show that the proposed method can efectively optimize the design parameters of the hydraulically fractured horizontal well,and the NPV can be improved greatly after optimization so that the design parameters can approach to their optimal values.展开更多
In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algor...In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.展开更多
Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess...Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.展开更多
Fault diagnosis plays an important role in complicated industrial process.It is a challenging task to detect,identify and locate faults quickly and accurately for large-scale process system.To solve the problem,a nove...Fault diagnosis plays an important role in complicated industrial process.It is a challenging task to detect,identify and locate faults quickly and accurately for large-scale process system.To solve the problem,a novel Multi Boost-based integrated ENN(extension neural network) fault diagnosis method is proposed.Fault data of complicated chemical process have some difficult-to-handle characteristics,such as high-dimension,non-linear and non-Gaussian distribution,so we use margin discriminant projection(MDP) algorithm to reduce dimensions and extract main features.Then,the affinity propagation(AP) clustering method is used to select core data and boundary data as training samples to reduce memory consumption and shorten learning time.Afterwards,an integrated ENN classifier based on Multi Boost strategy is constructed to identify fault types.The artificial data sets are tested to verify the effectiveness of the proposed method and make a detailed sensitivity analysis for the key parameters.Finally,a real industrial system—Tennessee Eastman(TE) process is employed to evaluate the performance of the proposed method.And the results show that the proposed method is efficient and capable to diagnose various types of faults in complicated chemical process.展开更多
The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge...The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.展开更多
The initial ensemble perturbations for an ensemble data assimilation system are expected to reasonably sample model uncertainty at the time of analysis to further reduce analysis uncertainty. Therefore, the careful ch...The initial ensemble perturbations for an ensemble data assimilation system are expected to reasonably sample model uncertainty at the time of analysis to further reduce analysis uncertainty. Therefore, the careful choice of an initial ensemble perturbation method that dynamically cycles ensemble perturbations is required for the optimal performance of the system. Based on the multivariate empirical orthogonal function (MEOF) method, a new ensemble initialization scheme is developed to generate balanced initial perturbations for the ensemble Kalman filter (EnKF) data assimilation, with a reasonable consideration of the physical relationships between different model variables. The scheme is applied in assimilation experiments with a global spectral atmospheric model and with real observations. The proposed perturbation method is compared to the commonly used method of spatially-correlated random perturbations. The comparisons show that the model uncertainties prior to the first analysis time, which are forecasted from the balanced ensemble initial fields, maintain a much more reasonable spread and a more accurate forecast error covariance than those from the randomly perturbed initial fields. The analysis results are further improved by the balanced ensemble initialization scheme due to more accurate background information. Also, a 20-day continuous assimilation experiment shows that the ensemble spreads for each model variable are still retained in reasonable ranges without considering additional perturbations or inflations during the assimilation cycles, while the ensemble spreads from the randomly perturbed initialization scheme decrease and collapse rapidly.展开更多
Residual useful life(RUL)prediction is a key issue for improving efficiency of aircraft engines and reducing their maintenance cost.Owing to various failure mechanism and operating environment,the application of class...Residual useful life(RUL)prediction is a key issue for improving efficiency of aircraft engines and reducing their maintenance cost.Owing to various failure mechanism and operating environment,the application of classical models in RUL prediction of aircraft engines is fairly difficult.In this study,a novel RUL prognostics method based on using ensemble recurrent neural network to process massive sensor data is proposed.First of all,sensor data obtained from the aircraft engines are preprocessed to eliminate singular values,reduce random fluctuation and preserve degradation trend of the raw sensor data.Secondly,three kinds of recurrent neural networks(RNN),including ordinary RNN,long shortterm memory(LSTM),and gated recurrent unit(GRU),are individually constructed.Thirdly,ensemble learning mechanism is designed to merge the above RNNs for producing a more accurate RUL prediction.The effectiveness of the proposed method is validated using two characteristically different turbofan engine datasets.Experimental results show a competitive performance of the proposed method in comparison with typical methods reported in literatures.展开更多
An ensemble-based method for the observation system simulation experiment(OSSE)is employed to design optimal observation stations and assess the present observation stations in the northeastern South China Sea(SCS).We...An ensemble-based method for the observation system simulation experiment(OSSE)is employed to design optimal observation stations and assess the present observation stations in the northeastern South China Sea(SCS).We employed the 20-year(1992-2012)sea surface height(SSH)data to design an array to monitor the intraseasonal to interannual variability.The results show that the most key region was found located at the northwest of Luzon Island(LI)where the energetic Luzon cyclonic gyre(LCG)occurs;other key regions include the edge of the LCG,the northwest of the Luzon Strait(LS),and the southwest of Taiwan,China.By contrast,we found that the present observation stations might oversample at the northwest of the LS and undersample at the northwest of LI.In addition,the optimal stations perform better in a larger area than the present stations.In vertical direction,the key layer is located within the upper 200-m depth,of which the surface and subsurface layers are most valuable to the observing system.展开更多
Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data ...Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data collected supports the real-time decision-making required for diverse applications.The communication infrastructure relies on different network types,including the Internet.This makes the infrastructure vulnerable to various attacks,which could compromise security or have devastating effects.However,traditional machine learning solutions cannot adapt to the increasing complexity and diversity of attacks.The objective of this paper is to develop an Anomaly Detection System(ADS)based on deep learning using the CIC-IDS2017 dataset.However,this dataset is highly imbalanced;thus,a two-step sampling technique:random under-sampling and the Synthetic Minority Oversampling Technique(SMOTE),is proposed to balance the dataset.The proposed system utilizes a multiple hidden layer Auto-encoder(AE)for feature extraction and dimensional reduction.In addition,an ensemble voting based on both Random Forest(RF)and Convolu-tional Neural Network(CNN)is developed to classify the multiclass attack cate-gories.The proposed system is evaluated and compared with six different state-of-the-art machine learning and deep learning algorithms:Random Forest(RF),Light Gradient Boosting Machine(LightGBM),eXtreme Gradient Boosting(XGboost),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM),and bidirectional LSTM(biLSTM).Experimental results show that the proposed model enhances the detection for each attack class compared with the other machine learning and deep learning models with overall accuracy(98.29%),precision(99%),recall(98%),F_(1) score(98%),and the UNDetection rate(UND)(8%).展开更多
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2022R1I1A3063493).
文摘Smart manufacturing and Industry 4.0 are transforming traditional manufacturing processes by utilizing innovative technologies such as the artificial intelligence(AI)and internet of things(IoT)to enhance efficiency,reduce costs,and ensure product quality.In light of the recent advancement of Industry 4.0,identifying defects has become important for ensuring the quality of products during the manufacturing process.In this research,we present an ensemble methodology for accurately classifying hot rolled steel surface defects by combining the strengths of four pre-trained convolutional neural network(CNN)architectures:VGG16,VGG19,Xception,and Mobile-Net V2,compensating for their individual weaknesses.We evaluated our methodology on the Xsteel surface defect dataset(XSDD),which comprises seven different classes.The ensemble methodology integrated the predictions of individual models through two methods:model averaging and weighted averaging.Our evaluation showed that the model averaging ensemble achieved an accuracy of 98.89%,a recall of 98.92%,a precision of 99.05%,and an F1-score of 98.97%,while the weighted averaging ensemble reached an accuracy of 99.72%,a recall of 99.74%,a precision of 99.67%,and an F1-score of 99.70%.The proposed weighted averaging ensemble model outperformed the model averaging method and the individual models in detecting defects in terms of accuracy,recall,precision,and F1-score.Comparative analysis with recent studies also showed the superior performance of our methodology.
基金the University of Transport Technology under the project entitled“Application of Machine Learning Algorithms in Landslide Susceptibility Mapping in Mountainous Areas”with grant number DTTD2022-16.
文摘This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MEST)No.2015R1A3A2031159,2016R1A5A1008055.
文摘Existing web-based security applications have failed in many situations due to the great intelligence of attackers.Among web applications,Cross-Site Scripting(XSS)is one of the dangerous assaults experienced while modifying an organization's or user's information.To avoid these security challenges,this article proposes a novel,all-encompassing combination of machine learning(NB,SVM,k-NN)and deep learning(RNN,CNN,LSTM)frameworks for detecting and defending against XSS attacks with high accuracy and efficiency.Based on the representation,a novel idea for merging stacking ensemble with web applications,termed“hybrid stacking”,is proposed.In order to implement the aforementioned methods,four distinct datasets,each of which contains both safe and unsafe content,are considered.The hybrid detection method can adaptively identify the attacks from the URL,and the defense mechanism inherits the advantages of URL encoding with dictionary-based mapping to improve prediction accuracy,accelerate the training process,and effectively remove the unsafe JScript/JavaScript keywords from the URL.The simulation results show that the proposed hybrid model is more efficient than the existing detection methods.It produces more than 99.5%accurate XSS attack classification results(accuracy,precision,recall,f1_score,and Receiver Operating Characteristic(ROC))and is highly resistant to XSS attacks.In order to ensure the security of the server's information,the proposed hybrid approach is demonstrated in a real-time environment.
文摘With the advancement of artificial intelligence,traffic forecasting is gaining more and more interest in optimizing route planning and enhancing service quality.Traffic volume is an influential parameter for planning and operating traffic structures.This study proposed an improved ensemble-based deep learning method to solve traffic volume prediction problems.A set of optimal hyperparameters is also applied for the suggested approach to improve the performance of the learning process.The fusion of these methodologies aims to harness ensemble empirical mode decomposition’s capacity to discern complex traffic patterns and long short-term memory’s proficiency in learning temporal relationships.Firstly,a dataset for automatic vehicle identification is obtained and utilized in the preprocessing stage of the ensemble empirical mode decomposition model.The second aspect involves predicting traffic volume using the long short-term memory algorithm.Next,the study employs a trial-and-error approach to select a set of optimal hyperparameters,including the lookback window,the number of neurons in the hidden layers,and the gradient descent optimization.Finally,the fusion of the obtained results leads to a final traffic volume prediction.The experimental results show that the proposed method outperforms other benchmarks regarding various evaluation measures,including mean absolute error,root mean squared error,mean absolute percentage error,and R-squared.The achieved R-squared value reaches an impressive 98%,while the other evaluation indices surpass the competing.These findings highlight the accuracy of traffic pattern prediction.Consequently,this offers promising prospects for enhancing transportation management systems and urban infrastructure planning.
基金The National Natural Science Foundation of China(No.61300167)the Open Project Program of State Key Laboratory for Novel Software Technology of Nanjing University(No.KFKT2015B17)+3 种基金the Natural Science Foundation of Jiangsu Province(No.BK20151274)Qing Lan Project of Jiangsu Provincethe Open Project Program of Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education(No.JYB201606)the Program for Special Talent in Six Fields of Jiangsu Province(No.XYDXXJS-048)
文摘In order to improve the performance of the attribute reduction algorithm to deal with the noisy and uncertain large data, a novel co-evolutionary cloud-based attribute ensemble multi-agent reduction(CCAEMR) algorithm is proposed.First, a co-evolutionary cloud framework is designed under the M apReduce mechanism to divide the entire population into different co-evolutionary subpopulations with a self-adaptive scale. Meanwhile, these subpopulations will share their rewards to accelerate attribute reduction implementation.Secondly, a multi-agent ensemble strategy of co-evolutionary elitist optimization is constructed to ensure that subpopulations can exploit any correlation and interdependency between interacting attribute subsets with reinforcing noise tolerance.Hence, these agents are kept within the stable elitist region to achieve the optimal profit. The experimental results show that the proposed CCAEMR algorithm has better efficiency and feasibility to solve large-scale and uncertain dataset problems with complex noise.
基金financially supported by the National Key R&D Program of China(Grant No.2022YFF1202600)the National Natural Science Foundation of China(Grant No.82301158)+4 种基金Science and Technology Innovation Action Plan of Shanghai Science and Technology Committee(Grant No.22015820100)Two-hundred Talent Support(Grant No.20152224)Translational Medicine Innovation Project of Shanghai Jiao Tong University School of Medicine(Grant No.TM201915)Clinical Research Project of Multi-Disciplinary Team,Shanghai Ninth People’s Hospital,Shanghai Jiao Tong University School of Medicine(Grant No.201914)China Postdoctoral Science Foundation(Grant No.2023M742332)。
文摘Essential proteins are crucial for biological processes and can be identified through both experimental and computational methods.While experimental approaches are highly accurate,they often demand extensive time and resources.To address these challenges,we present a computational ensemble learning framework designed to identify essential proteins more efficiently.Our method begins by using node2vec to transform proteins in the protein–protein interaction(PPI)network into continuous,low-dimensional vectors.We also extract a range of features from protein sequences,including graph-theory-based,information-based,compositional,and physiochemical attributes.Additionally,we leverage deep learning techniques to analyze high-dimensional position-specific scoring matrices(PSSMs)and capture evolutionary information.We then combine these features for classification using various machine learning algorithms.To enhance performance,we integrate the outputs of these algorithms through ensemble methods such as voting,weighted averaging,and stacking.This approach effectively addresses data imbalances and improves both robustness and accuracy.Our ensemble learning framework achieves an AUC of 0.960 and an accuracy of 0.9252,outperforming other computational methods.These results demonstrate the effectiveness of our approach in accurately identifying essential proteins and highlight its superior feature extraction capabilities.
基金sponsored by the Special Fund for Meteorological Research in the Public Interest from the Ministry of Science and Technology of China(Grant No.GYHY201306004)the National Key Basic Research Program of China(Grant No.2013CB430104)+1 种基金the National Natural Science Foundation of China(Grant Nos.41461164006,41375048 and 41425018)supported by the Ministry of Science and Technology of Taiwan(Grant No.MOST103-2111-M-002-011-MY3)
文摘This study explored the impact of coastal radar observability on the forecast of the track and rainfall of Typhoon Morakot (2009) using a WRF-based ensemble Kalman filter (EnKF) data assimilation (DA) system. The results showed that the performance of radar EnKF DA was quite sensitive to the number of radars being assimilated and the DA timing relative to the landfall of the tropical cyclone (TC). It was found that assimilating radial velocity (Vr) data from all the four operational radars during the 6 h immediately before TC landfall was quite important for the track and rainfall forecasts after the TC made landfall. The TC track forecast error could be decreased by about 43% and the 24-h rainfall forecast skill could be almost tripled. Assimilating Vr data from a single radar outperformed the experiment without DA, though with less improvement compared to the multiple-radar DA experiment. Different forecast performances were obtained by assimilating different radars, which was closely related to the first-time wind analysis increment, the location of moisture transport, the quasi-stationary rainband, and the local convergence line. However, only assimilating Vr data when the TC was farther away from making landfall might worsen TC track and rainfall forecasts. Besides, this work also demonstrated that Vr data from multiple radars, instead of a single radar, should be used for verification to obtain a more reliable assessment of the EnKF performance.
基金Projects(41807259,51604109)supported by the National Natural Science Foundation of ChinaProject(2020CX040)supported by the Innovation-Driven Project of Central South University,ChinaProject(2018JJ3693)supported by the Natural Science Foundation of Hunan Province,China。
文摘Rockburst prediction is of vital significance to the design and construction of underground hard rock mines.A rockburst database consisting of 102 case histories,i.e.,1998−2011 period data from 14 hard rock mines was examined for rockburst prediction in burst-prone mines by three tree-based ensemble methods.The dataset was examined with six widely accepted indices which are:the maximum tangential stress around the excavation boundary(MTS),uniaxial compressive strength(UCS)and uniaxial tensile strength(UTS)of the intact rock,stress concentration factor(SCF),rock brittleness index(BI),and strain energy storage index(EEI).Two boosting(AdaBoost.M1,SAMME)and bagging algorithms with classification trees as baseline classifier on ability to learn rockburst were evaluated.The available dataset was randomly divided into training set(2/3 of whole datasets)and testing set(the remaining datasets).Repeated 10-fold cross validation(CV)was applied as the validation method for tuning the hyper-parameters.The margin analysis and the variable relative importance were employed to analyze some characteristics of the ensembles.According to 10-fold CV,the accuracy analysis of rockburst dataset demonstrated that the best prediction method for the potential of rockburst is bagging when compared to AdaBoost.M1,SAMME algorithms and empirical criteria methods.
基金supported by the National Fundamental Research (973) Program of China (Grant No. 2013CB430103)the Special Foundation of the China Meteorological Administration (Grant No. GYHY201506006)supported by the National Science Foundation of China (Grant No. 41405100)
文摘On 21 July 2012,an extreme rainfall event that recorded a maximum rainfall amount over 24 hours of 460 mm,occurred in Beijing,China. Most operational models failed to predict such an extreme amount. In this study,a convective-permitting ensemble forecast system(CEFS),at 4-km grid spacing,covering the entire mainland of China,is applied to this extreme rainfall case. CEFS consists of 22 members and uses multiple physics parameterizations. For the event,the predicted maximum is 415 mm d^-1 in the probability-matched ensemble mean. The predicted high-probability heavy rain region is located in southwest Beijing,as was observed. Ensemble-based verification scores are then investigated. For a small verification domain covering Beijing and its surrounding areas,the precipitation rank histogram of CEFS is much flatter than that of a reference global ensemble. CEFS has a lower(higher) Brier score and a higher resolution than the global ensemble for precipitation,indicating more reliable probabilistic forecasting by CEFS. Additionally,forecasts of different ensemble members are compared and discussed. Most of the extreme rainfall comes from convection in the warm sector east of an approaching cold front. A few members of CEFS successfully reproduce such precipitation,and orographic lift of highly moist low-level flows with a significantly southeasterly component is suggested to have played important roles in producing the initial convection. Comparisons between good and bad forecast members indicate a strong sensitivity of the extreme rainfall to the mesoscale environmental conditions,and,to less of an extent,the model physics.
基金supported by U.S. National Science Foundation through Award Number ATM-0833985
文摘This study examines the effectiveness of ensemble Kalman filters in data assimilation with the strongly nonlinear dynamics of the Lorenz-63 model, and in particular their use in predicting the regime transition that occurs when the model jumps from one basin of attraction to the other. Four configurations of the ensemble-based Kalman filtering data assimilation techniques, including the ensemble Kalman filter, en- semble adjustment Kalman filter, ensemble square root filter and ensemble transform Kalman filter, are evaluated with their ability in predicting the regime transition (also called phase transition) and also are compared in terms of their sensitivity to both observational and sampling errors. The sensitivity of each ensemble-based filter to the size of the ensemble is also examined.
文摘An effective ensemble should consist of a set of networks that are both accurate and diverse. We propose a novel clustering-based selective algorithm for constructing neural network ensemble, where clustering technology is used to classify trained networks according to similarity and optimally select the most accurate individual network from each cluster to make up the ensemble. Empirical studies on regression of four typical datasets showed that this approach yields significantly smaller en- semble achieving better performance than other traditional ones such as Bagging and Boosting. The bias variance decomposition of the predictive error shows that the success of the proposed approach may lie in its properly tuning the bias/variance trade-off to reduce the prediction error (the sum of bias2 and variance).
基金This work is funded by the National Science and Technology Major Project of China(Grant Nos.2016ZX05037003-003 and 2017ZX05032004-002)PetroChina Innovation Foundation(Grant No.2020D-5007-0203)+2 种基金the National Natural Science Foundation of China(Grant No.51374222)the Sinopec fundamental perspective research project(Grant No.P18086-5)Joint Funds of the National Natural Science Foundation of China(U19B6003-02-05)supported by Science Foundation of China University of Petroleum,Beijing(Nos.2462018QZDX13 and 2462020YXZZ028).
文摘Shale gas reservoirs have been successfully developed due to the advancement of the horizontal well drilling and multistage hydraulic fracturing techniques.However,the optimization design of the horizontal well drilling,hydraulic fracturing,and operational schedule is a challenging problem.An ensemble-based optimization method(EnOpt)is proposed here to optimize the design of the hydraulically fractured horizontal well in the shale gas reservoir.The objective is to maximize the net present value(NPV)which requires a simulation model to predict the cumulative shale gas production.To accurately describe the geometry of the hydraulic fractures,the embedded discrete fracture modeling method(EDFM)is used to construct the shale gas simulation model.The efects of gas absorption,Knudsen difusion,natural and hydraulic fractures,and gas-water two phase fow are considered in the shale gas production system.To improve the parameter continuity and Gaussianity required by the EnOpt method,the Hough transformation parameterization is used to characterize the horizontal well.The results show that the proposed method can efectively optimize the design parameters of the hydraulically fractured horizontal well,and the NPV can be improved greatly after optimization so that the design parameters can approach to their optimal values.
文摘In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.
基金This work was supported by the National Science Foundation of China(62176055)the China University S&T Innovation Plan Guided by the Ministry of Education.
文摘Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.
基金Project (61203021) supported by the National Natural Science Foundation of ChinaProject (2011216011) supported by the Key Science and Technology Program of Liaoning Province,China+1 种基金Project (2013020024) supported by the Natural Science Foundation of Liaoning Province,ChinaProject (LJQ2015061) supported by the Program for Liaoning Excellent Talents in Universities,China
文摘Fault diagnosis plays an important role in complicated industrial process.It is a challenging task to detect,identify and locate faults quickly and accurately for large-scale process system.To solve the problem,a novel Multi Boost-based integrated ENN(extension neural network) fault diagnosis method is proposed.Fault data of complicated chemical process have some difficult-to-handle characteristics,such as high-dimension,non-linear and non-Gaussian distribution,so we use margin discriminant projection(MDP) algorithm to reduce dimensions and extract main features.Then,the affinity propagation(AP) clustering method is used to select core data and boundary data as training samples to reduce memory consumption and shorten learning time.Afterwards,an integrated ENN classifier based on Multi Boost strategy is constructed to identify fault types.The artificial data sets are tested to verify the effectiveness of the proposed method and make a detailed sensitivity analysis for the key parameters.Finally,a real industrial system—Tennessee Eastman(TE) process is employed to evaluate the performance of the proposed method.And the results show that the proposed method is efficient and capable to diagnose various types of faults in complicated chemical process.
文摘The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No. KZCX1-YW-12-03)the National Basic Research Program of China (Grant No. 2010CB951901)the National Natural Science Foundation of China (Grant No. 40805033)
文摘The initial ensemble perturbations for an ensemble data assimilation system are expected to reasonably sample model uncertainty at the time of analysis to further reduce analysis uncertainty. Therefore, the careful choice of an initial ensemble perturbation method that dynamically cycles ensemble perturbations is required for the optimal performance of the system. Based on the multivariate empirical orthogonal function (MEOF) method, a new ensemble initialization scheme is developed to generate balanced initial perturbations for the ensemble Kalman filter (EnKF) data assimilation, with a reasonable consideration of the physical relationships between different model variables. The scheme is applied in assimilation experiments with a global spectral atmospheric model and with real observations. The proposed perturbation method is compared to the commonly used method of spatially-correlated random perturbations. The comparisons show that the model uncertainties prior to the first analysis time, which are forecasted from the balanced ensemble initial fields, maintain a much more reasonable spread and a more accurate forecast error covariance than those from the randomly perturbed initial fields. The analysis results are further improved by the balanced ensemble initialization scheme due to more accurate background information. Also, a 20-day continuous assimilation experiment shows that the ensemble spreads for each model variable are still retained in reasonable ranges without considering additional perturbations or inflations during the assimilation cycles, while the ensemble spreads from the randomly perturbed initialization scheme decrease and collapse rapidly.
基金the National Natural Science Foundationof China(Nos.11672098,11502063)the Natural Science Foundation of Anhui Province(No.1608085QA07).
文摘Residual useful life(RUL)prediction is a key issue for improving efficiency of aircraft engines and reducing their maintenance cost.Owing to various failure mechanism and operating environment,the application of classical models in RUL prediction of aircraft engines is fairly difficult.In this study,a novel RUL prognostics method based on using ensemble recurrent neural network to process massive sensor data is proposed.First of all,sensor data obtained from the aircraft engines are preprocessed to eliminate singular values,reduce random fluctuation and preserve degradation trend of the raw sensor data.Secondly,three kinds of recurrent neural networks(RNN),including ordinary RNN,long shortterm memory(LSTM),and gated recurrent unit(GRU),are individually constructed.Thirdly,ensemble learning mechanism is designed to merge the above RNNs for producing a more accurate RUL prediction.The effectiveness of the proposed method is validated using two characteristically different turbofan engine datasets.Experimental results show a competitive performance of the proposed method in comparison with typical methods reported in literatures.
基金Supported by the National Key Research&Development Plan of China(Nos.2016YFC1401703,2016YFC1401702,2018YFC0309803)the National Natural Science Foundation of China(Nos.41506002,41676010,41476011,41676015,41606026)+1 种基金the Institution of South China Sea Ecology and Environmental Engineering,Chinese Academy of Sciences(No.ISEE2019ZR0)the Guangzhou Science and Technology Foundation(No.201804010133)。
文摘An ensemble-based method for the observation system simulation experiment(OSSE)is employed to design optimal observation stations and assess the present observation stations in the northeastern South China Sea(SCS).We employed the 20-year(1992-2012)sea surface height(SSH)data to design an array to monitor the intraseasonal to interannual variability.The results show that the most key region was found located at the northwest of Luzon Island(LI)where the energetic Luzon cyclonic gyre(LCG)occurs;other key regions include the edge of the LCG,the northwest of the Luzon Strait(LS),and the southwest of Taiwan,China.By contrast,we found that the present observation stations might oversample at the northwest of the LS and undersample at the northwest of LI.In addition,the optimal stations perform better in a larger area than the present stations.In vertical direction,the key layer is located within the upper 200-m depth,of which the surface and subsurface layers are most valuable to the observing system.
文摘Advanced Metering Infrastructure(AMI)is the metering network of the smart grid that enables bidirectional communications between each consumer’s premises and the provider’s control center.The massive amount of data collected supports the real-time decision-making required for diverse applications.The communication infrastructure relies on different network types,including the Internet.This makes the infrastructure vulnerable to various attacks,which could compromise security or have devastating effects.However,traditional machine learning solutions cannot adapt to the increasing complexity and diversity of attacks.The objective of this paper is to develop an Anomaly Detection System(ADS)based on deep learning using the CIC-IDS2017 dataset.However,this dataset is highly imbalanced;thus,a two-step sampling technique:random under-sampling and the Synthetic Minority Oversampling Technique(SMOTE),is proposed to balance the dataset.The proposed system utilizes a multiple hidden layer Auto-encoder(AE)for feature extraction and dimensional reduction.In addition,an ensemble voting based on both Random Forest(RF)and Convolu-tional Neural Network(CNN)is developed to classify the multiclass attack cate-gories.The proposed system is evaluated and compared with six different state-of-the-art machine learning and deep learning algorithms:Random Forest(RF),Light Gradient Boosting Machine(LightGBM),eXtreme Gradient Boosting(XGboost),Convolutional Neural Network(CNN),Long Short-Term Memory(LSTM),and bidirectional LSTM(biLSTM).Experimental results show that the proposed model enhances the detection for each attack class compared with the other machine learning and deep learning models with overall accuracy(98.29%),precision(99%),recall(98%),F_(1) score(98%),and the UNDetection rate(UND)(8%).