The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and t...The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and the corresponding ground-state spins as labels or output predictions.The quantum many-body system problem exceeds the capability of our optimized NNs in terms of accurately predicting the ground-state spin of each sample within the TBRE.However,our NN model effectively captured the statistical properties of the ground-state spin because it learned the empirical regularity of the ground-state spin distribution in TBRE,as discovered by physicists.展开更多
Forecasting of ocean currents is critical for both marine meteorological research and ocean engineering and construction.Timely and accurate forecasting of coastal current velocities offers a scientific foundation and...Forecasting of ocean currents is critical for both marine meteorological research and ocean engineering and construction.Timely and accurate forecasting of coastal current velocities offers a scientific foundation and decision support for multiple practices such as search and rescue,disaster avoidance and remediation,and offshore construction.This research established a framework to generate short-term surface current forecasts based on ensemble machine learning trained on high frequency radar observation.Results indicate that an ensemble algorithm that used random forests to filter forecasting features by weighting them,and then used the AdaBoost method to forecast can significantly reduce the model training time,while ensuring the model forecasting effectiveness,with great economic benefits.Model accuracy is a function of surface current variability and the forecasting horizon.In order to improve the forecasting capability and accuracy of the model,the model structure of the ensemble algorithm was optimized,and the random forest algorithm was used to dynamically select model features.The results show that the error variation of the optimized surface current forecasting model has a more regular error variation,and the importance of the features varies with the forecasting time-step.At ten-step ahead forecasting horizon the model reported root mean square error,mean absolute error,and correlation coefficient by 2.84 cm/s,2.02 cm/s,and 0.96,respectively.The model error is affected by factors such as topography,boundaries,and geometric accuracy of the observation system.This paper demonstrates the potential of ensemble-based machine learning algorithm to improve forecasting of ocean currents.展开更多
Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these d...Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care.This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques.Sequential forward feature selection,sequential backward feature elimination,and bidirectional feature elimination are investigated in this study.In ensemble learning,random forest,adaptive boosting,and bagging classifiers are employed.The effectiveness of these techniques is evaluated using two different datasets obtained from the University of California Irvine-Machine Learning Repository,both of which undergo preprocessing steps,including outlier removal,addressing missing data,data cleansing,and feature reduction.Extensive experimentation demonstrates the remarkable success of proposed ensemble stacking and bidirectional feature elimination achieving 100%and 99.86%accuracy in identifying hyperthyroidism and hypothyroidism,respectively.Beyond enhancing detection accuracy,the ensemble stacking model also demonstrated a streamlined computational complexity which is pivotal for practical medical applications.It significantly outperformed existing studies with similar objectives underscoring the viability and effectiveness of the proposed scheme.This research offers an innovative perspective and sets the platform for improved thyroid disorder diagnosis with broader implications for healthcare and patient well-being.展开更多
As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic...As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic power plants.This study proposes an integrated deep learning-based photovoltaic resource assessment method.Ensemble learning and deep learning methods are fused for photovoltaic resource assessment for the first time.The proposed method combines the random forest,gated recurrent unit,and long short-term memory to effectively improve the accuracy and reliability of photovoltaic resource assessment.The proposed method has strong adaptability and high accuracy even in the photovoltaic resource assessment of complex terrain and landscape.The experimental results show that the proposed method outperforms the comparison algorithm in all evaluation indexes,indicating that the proposed method has higher accuracy and reliability in photovoltaic resource assessment with improved generalization performance traditional single algorithm.展开更多
This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text...This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.展开更多
This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one ...This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one thousand finite element cases were numerically analyzed,followed by extensive parametric studies.Surrogate models were developed via ensemble learning methods(ELMs),including the e Xtreme Gradient Boosting(XGBoost),and Random Forest Regression(RFR)to predict the maximum lateral wall deformation(δhmax).Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression(DTR),Multilayer Perceptron Regression(MLPR),and Multivariate Adaptive Regression Splines(MARS).This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast,alternative way.展开更多
With rapid development of blockchain technology,blockchain and its security theory research and practical application have become crucial.At present,a new DDoS attack has arisen,and it is the DDoS attack in blockchain...With rapid development of blockchain technology,blockchain and its security theory research and practical application have become crucial.At present,a new DDoS attack has arisen,and it is the DDoS attack in blockchain network.The attack is harmful for blockchain technology and many application scenarios.However,the traditional and existing DDoS attack detection and defense means mainly come from the centralized tactics and solution.Aiming at the above problem,the paper proposes the virtual reality parallel anti-DDoS chain design philosophy and distributed anti-D Chain detection framework based on hybrid ensemble learning.Here,Ada Boost and Random Forest are used as our ensemble learning strategy,and some different lightweight classifiers are integrated into the same ensemble learning algorithm,such as CART and ID3.Our detection framework in blockchain scene has much stronger generalization performance,universality and complementarity to identify accurately the onslaught features for DDoS attack in P2P network.Extensive experimental results confirm that our distributed heterogeneous anti-D chain detection method has better performance in six important indicators(such as Precision,Recall,F-Score,True Positive Rate,False Positive Rate,and ROC curve).展开更多
The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge...The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.展开更多
In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algor...In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.展开更多
Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-drive...Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.展开更多
Amongst several biometric traits,Vein pattern biometric has drawn much attention among researchers and diverse users.It gains its importance due to its difficulty in reproduction and inherent security advantages.Many ...Amongst several biometric traits,Vein pattern biometric has drawn much attention among researchers and diverse users.It gains its importance due to its difficulty in reproduction and inherent security advantages.Many research papers have dealt with the topic of new generation biometric solutions such as iris and vein biometrics.However,most implementations have been based on small datasets due to the difficulties in obtaining samples.In this paper,a deeper study has been conducted on previously suggested methods based on Convolutional Neural Networks(CNN)using a larger dataset.Also,modifications are suggested for implementation using ensemble methods.Ensembles were used to reduce training time and cost by training multiple weak classifiers instead of a single,strong classifier.Classifiers used were CNN,Random Forest and Logistic Regression.An inexpensive and robust data acquisition system was also developed for obtaining the dataset.The obtained result shows an improved accuracy of 96.77%using ensemble method instead of dealing with a single classifier.展开更多
Production optimization is of significance for carbonate reservoirs,directly affecting the sustainability and profitability of reservoir development.Traditional physics-based numerical simulations suffer from insuffic...Production optimization is of significance for carbonate reservoirs,directly affecting the sustainability and profitability of reservoir development.Traditional physics-based numerical simulations suffer from insufficient calculation accuracy and excessive time consumption when performing production optimization.We establish an ensemble proxy-model-assisted optimization framework combining the Bayesian random forest(BRF)with the particle swarm optimization algorithm(PSO).The BRF method is implemented to construct a proxy model of the injectioneproduction system that can accurately predict the dynamic parameters of producers based on injection data and production measures.With the help of proxy model,PSO is applied to search the optimal injection pattern integrating Pareto front analysis.After experimental testing,the proxy model not only boasts higher prediction accuracy compared to deep learning,but it also requires 8 times less time for training.In addition,the injection mode adjusted by the PSO algorithm can effectively reduce the gaseoil ratio and increase the oil production by more than 10% for carbonate reservoirs.The proposed proxy-model-assisted optimization protocol brings new perspectives on the multi-objective optimization problems in the petroleum industry,which can provide more options for the project decision-makers to balance the oil production and the gaseoil ratio considering physical and operational constraints.展开更多
This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classi...This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classification or prediction by the RF classifier.The preprocessed data is normalized using minmax normalization often used before modelfitting.As the input data or variables are measured at different scales,it is necessary to normalize them to contribute equally to the modelfitting.Then,the RF classifier is employed for course selection which is an ensemble learning method and k-fold cross-validation(k=10)is used to validate the model.The proposed Prediction Model for Course Selection(PMCS)system is considered a multi-class problem that predicts the course for a particular learner with three complexity levels,namely low,medium and high.It is operated under two modes;locally and globally.The former considers the gender of the learner and the later does not consider the gender of the learner.The database comprises the learner opinions from 75 males and 75 females per category(low,medium and high).Thus the system uses a total of 450 samples to evaluate the performance of the PMCS system.Results show that the system’s performance,while using locally i.e.,gender-wise has slightly higher performance than the global system.The RF classifier with 75 decision trees in the global system provides an average accuracy of 97.6%,whereas in the local system it is 97%(male)and 97.6%(female).The overall performance of the RF classifier with 75 trees is better than 25,50 and 100 decision trees in both local and global systems.展开更多
The flowering forecast provides recommendations for orchard cleaning, pest control, field management and fertilization, which can help increase tree vigor and resistance. Flowering forecast is not only an important pa...The flowering forecast provides recommendations for orchard cleaning, pest control, field management and fertilization, which can help increase tree vigor and resistance. Flowering forecast is not only an important part of the construction of agro-meteorological index system, but also an important part of the meteorological service system. In this paper, by analyzing local meteorological data and phenological data of “Red Fuji” apples in Fen County, Linfen City, Shanxi Province, with the help of machine learning and neural networks, we proposed a method based on the combination of time series forecasting and classification forecasting is proposed to complete the dynamic forecasting model of local flowering in Ji County. Then, we evaluated the effectiveness of the model based on the number of error days and the number of days in advance. The implementation shows that the proposed multivariable LSTM network has a good effect on the prediction of meteorological factors. The model loss is less than 0.2. In the two-category task of flowering judgment, the idea of combining strategies in ensemble learning improves the effect of flowering judgment, and its AUC value increases from 0.81 and 0.80 of single model RF and AdaBoost to 0.82. The proposed model has high applicability and accuracy for flowering forecast. At the same time, the model solves the problem of rounding decimals in the prediction of flowering dates by the regression method.展开更多
Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized m...Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.展开更多
Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urb...Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanizedcountries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples isbecoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions.The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the proposed work, theiterative weighted map reduce framework is introduced for the effective management of large dataset samples. A binary classification problem is formulated usingensemble nonlinear support vector machines and random forests. Thus, instead ofusing the normal linear combination of kernel activations, the proposed work creates nonlinear combinations of kernel activations in prototype examples. Furthermore, different descriptors are combined in an ensemble of deep support vectormachines, where the product rule is used to combine probability estimates ofdifferent classifiers. Performance is evaluated in terms of the prediction accuracyand interpretability of the model and the results.展开更多
Difficulty in communicating and interacting with other people are mainly due to the neurological disorder called autism spectrum disorder(ASD)diseases.These diseases can affect the nerves at any stage of the human bein...Difficulty in communicating and interacting with other people are mainly due to the neurological disorder called autism spectrum disorder(ASD)diseases.These diseases can affect the nerves at any stage of the human being in childhood,adolescence,and adulthood.ASD is known as a behavioral disease due to the appearances of symptoms over thefirst two years that continue until adulthood.Most of the studies prove that the early detection of ASD helps improve the behavioral characteristics of patients with ASD.The detection of ASD is a very challenging task among various researchers.Machine learning(ML)algorithms still act very intelligent by learning the complex data and pre-dicting quality results.In this paper,ensemble ML techniques for the early detec-tion of ASD are proposed.In this detection,the dataset isfirst processed using three ML algorithms such as sequential minimal optimization with support vector machine,Kohonen self-organizing neural network,and random forest algorithm.The prediction results of these ML algorithms(ensemble)further use the bagging concept called max voting to predict thefinal result.The accuracy,sensitivity,and specificity of the proposed system are calculated using confusion matrix.The pro-posed ensemble technique performs better than state-of-the art ML algorithms.展开更多
Diabetes is a hereditary disorder that interferes with human life at all ages.It is challenging for cells to absorb glucose from the bloodstream when an individual has diabetes.The two main subtypes of diabetes are ty...Diabetes is a hereditary disorder that interferes with human life at all ages.It is challenging for cells to absorb glucose from the bloodstream when an individual has diabetes.The two main subtypes of diabetes are type 1 diabetes and type 2 diabetes.Type 1 diabetes develops when the pancreas cannot make enough insulin,whereas type 2 diabetes spreads due to insulin resistance.Diabetes is a recurrent,and chronic illness that is incurable.In modern healthcare systems,disease detection technology is pervasive.Detecting diabetes in its early stages is crucial for initiating timely treatment and halting disease progression.The proposed method has the potential not only to forecast the likelihood of future diabetes onset but also to identify the specific type of diabetes a person may develop.This paper investigates a potential solution for a diabetes prediction model in light of the continually rising prevalence of diabetes among patients.The proposed framework is designed using two datasets:the Pima Indian dataset,which is used to forecast diabetes,and the DiabetesType dataset,which is used to identify the type of diabetes mellitus an individual has.This research aims to apply machine learning classifiers and ensemble models,such as Bagging,Voting,Averaging,and Stacking,for diabetes prediction.In this context,SMOTE(synthetic minority oversampling technique)and hyperparameter adjustment of the algorithms are considered and have substantially improved the findings.The developed heterogeneous ensemble model offers enhanced prediction rates with different performance criteria.Using the bagging technique,random forest attains a 96%accuracy rate,resulting in better predictions in the PID dataset.Regarding the DiabetesType dataset,the voting ensemble model provides a 98.5%accuracy rate.This study highlights that ensemble learning models are effective in predicting diabetes and can outperform earlier relevant studies.展开更多
Fatigue reliability-based design optimization of aeroengine structures involves multiple repeated calculations of reliability degree and large-scale calls of implicit high-nonlinearity limit state function,leading to ...Fatigue reliability-based design optimization of aeroengine structures involves multiple repeated calculations of reliability degree and large-scale calls of implicit high-nonlinearity limit state function,leading to the traditional direct Monte Claro and surrogate methods prone to unacceptable computing efficiency and accuracy.In this case,by fusing the random subspace strategy and weight allocation technology into bagging ensemble theory,a random forest(RF)model is presented to enhance the computing efficiency of reliability degree;moreover,by embedding the RF model into multilevel optimization model,an efficient RF-assisted fatigue reliability-based design optimization framework is developed.Regarding the low-cycle fatigue reliability-based design optimization of aeroengine turbine disc as a case,the effectiveness of the presented framework is validated.The reliabilitybased design optimization results exhibit that the proposed framework holds high computing accuracy and computing efficiency.The current efforts shed a light on the theory/method development of reliability-based design optimization of complex engineering structures.展开更多
As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empi...As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empirical analysis.Researchers in the field of machine learning have proved that random forest can form better judgements on this kind of problem,and it has an auxiliary role in the prediction of stock trend.This study uses historical trading data of four listed companies in the USA stock market,and the purpose of this study is to improve the performance of random forest model in medium-and long-term stock trend prediction.This study applies the exponential smoothing method to process the initial data,calculates the relevant technical indicators as the characteristics to be selected,and proposes the D-RF-RS method to optimize random forest.As the random forest is an ensemble learning model and is closely related to decision tree,D-RF-RS method uses a decision tree to screen the importance of features,and obtains the effective strong feature set of the model as input.Then,the parameter combination of the model is optimized through random parameter search.The experimental results show that the average accuracy of random forest is increased by 0.17 after the above process optimization,which is 0.18 higher than the average accuracy of light gradient boosting machine model.Combined with the performance of the ROC curve and Precision–Recall curve,the stability of the model is also guaranteed,which further demonstrates the advantages of random forest in medium-and long-term trend prediction of the stock market.展开更多
基金supported by the National Natural Science Foundation of China Youth Fund(12105234)。
文摘The distribution of the nuclear ground-state spin in a two-body random ensemble(TBRE)was studied using a general classification neural network(NN)model with two-body interaction matrix elements as input features and the corresponding ground-state spins as labels or output predictions.The quantum many-body system problem exceeds the capability of our optimized NNs in terms of accurately predicting the ground-state spin of each sample within the TBRE.However,our NN model effectively captured the statistical properties of the ground-state spin because it learned the empirical regularity of the ground-state spin distribution in TBRE,as discovered by physicists.
基金The fund from Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)under contract No.SML2020SP009the National Basic Research and Development Program of China under contract Nos 2022YFF0802000 and 2022YFF0802004+3 种基金the“Renowned Overseas Professors”Project of Guangdong Provincial Department of Science and Technology under contract No.76170-52910004the Belt and Road Special Foundation of the National Key Laboratory of Water Disaster Prevention under contract No.2022491711the National Natural Science Foundation of China under contract No.51909290the Key Research and Development Program of Guangdong Province under contract No.2020B1111020003.
文摘Forecasting of ocean currents is critical for both marine meteorological research and ocean engineering and construction.Timely and accurate forecasting of coastal current velocities offers a scientific foundation and decision support for multiple practices such as search and rescue,disaster avoidance and remediation,and offshore construction.This research established a framework to generate short-term surface current forecasts based on ensemble machine learning trained on high frequency radar observation.Results indicate that an ensemble algorithm that used random forests to filter forecasting features by weighting them,and then used the AdaBoost method to forecast can significantly reduce the model training time,while ensuring the model forecasting effectiveness,with great economic benefits.Model accuracy is a function of surface current variability and the forecasting horizon.In order to improve the forecasting capability and accuracy of the model,the model structure of the ensemble algorithm was optimized,and the random forest algorithm was used to dynamically select model features.The results show that the error variation of the optimized surface current forecasting model has a more regular error variation,and the importance of the features varies with the forecasting time-step.At ten-step ahead forecasting horizon the model reported root mean square error,mean absolute error,and correlation coefficient by 2.84 cm/s,2.02 cm/s,and 0.96,respectively.The model error is affected by factors such as topography,boundaries,and geometric accuracy of the observation system.This paper demonstrates the potential of ensemble-based machine learning algorithm to improve forecasting of ocean currents.
基金supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant Funded by the Korean government(MSIT)(2021-0-00755,Dark Data Analysis Technology for Data Scale and Accuracy Improvement)This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R407)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Thyroid disorders represent a significant global health challenge with hypothyroidism and hyperthyroidism as two common conditions arising from dysfunction in the thyroid gland.Accurate and timely diagnosis of these disorders is crucial for effective treatment and patient care.This research introduces a comprehensive approach to improve the accuracy of thyroid disorder diagnosis through the integration of ensemble stacking and advanced feature selection techniques.Sequential forward feature selection,sequential backward feature elimination,and bidirectional feature elimination are investigated in this study.In ensemble learning,random forest,adaptive boosting,and bagging classifiers are employed.The effectiveness of these techniques is evaluated using two different datasets obtained from the University of California Irvine-Machine Learning Repository,both of which undergo preprocessing steps,including outlier removal,addressing missing data,data cleansing,and feature reduction.Extensive experimentation demonstrates the remarkable success of proposed ensemble stacking and bidirectional feature elimination achieving 100%and 99.86%accuracy in identifying hyperthyroidism and hypothyroidism,respectively.Beyond enhancing detection accuracy,the ensemble stacking model also demonstrated a streamlined computational complexity which is pivotal for practical medical applications.It significantly outperformed existing studies with similar objectives underscoring the viability and effectiveness of the proposed scheme.This research offers an innovative perspective and sets the platform for improved thyroid disorder diagnosis with broader implications for healthcare and patient well-being.
基金funded by Key-Area Research and Development Program Project of Guangdong Province (2021B0101230003)China Southern Power Grid Science and Technology Project (ZBKJXM20220004).
文摘As the global demand for renewable energy grows,solar energy is gaining attention as a clean,sustainable energy source.Accurate assessment of solar energy resources is crucial for the siting and design of photovoltaic power plants.This study proposes an integrated deep learning-based photovoltaic resource assessment method.Ensemble learning and deep learning methods are fused for photovoltaic resource assessment for the first time.The proposed method combines the random forest,gated recurrent unit,and long short-term memory to effectively improve the accuracy and reliability of photovoltaic resource assessment.The proposed method has strong adaptability and high accuracy even in the photovoltaic resource assessment of complex terrain and landscape.The experimental results show that the proposed method outperforms the comparison algorithm in all evaluation indexes,indicating that the proposed method has higher accuracy and reliability in photovoltaic resource assessment with improved generalization performance traditional single algorithm.
文摘This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.
基金supported by the High-end Foreign Expert Introduction program(No.G20190022002)Chongqing Construction Science and Technology Plan Project(2019-0045)+1 种基金the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJZD-K201900102)The financial support is gratefully acknowledged。
文摘This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one thousand finite element cases were numerically analyzed,followed by extensive parametric studies.Surrogate models were developed via ensemble learning methods(ELMs),including the e Xtreme Gradient Boosting(XGBoost),and Random Forest Regression(RFR)to predict the maximum lateral wall deformation(δhmax).Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression(DTR),Multilayer Perceptron Regression(MLPR),and Multivariate Adaptive Regression Splines(MARS).This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast,alternative way.
基金performed in the Project“Cloud Interaction Technology and Service Platform for Mine Internet of things”supported by National Key Research and Development Program of China(2017YFC0804406)+1 种基金partly supported by the Project“Massive DDoS Attack Traffic Detection Technology Research based on Big Data and Cloud Environment”supported by Scientific Research Foundation of Shandong University of Science and Technology for Recruited Talents(0104060511314)。
文摘With rapid development of blockchain technology,blockchain and its security theory research and practical application have become crucial.At present,a new DDoS attack has arisen,and it is the DDoS attack in blockchain network.The attack is harmful for blockchain technology and many application scenarios.However,the traditional and existing DDoS attack detection and defense means mainly come from the centralized tactics and solution.Aiming at the above problem,the paper proposes the virtual reality parallel anti-DDoS chain design philosophy and distributed anti-D Chain detection framework based on hybrid ensemble learning.Here,Ada Boost and Random Forest are used as our ensemble learning strategy,and some different lightweight classifiers are integrated into the same ensemble learning algorithm,such as CART and ID3.Our detection framework in blockchain scene has much stronger generalization performance,universality and complementarity to identify accurately the onslaught features for DDoS attack in P2P network.Extensive experimental results confirm that our distributed heterogeneous anti-D chain detection method has better performance in six important indicators(such as Precision,Recall,F-Score,True Positive Rate,False Positive Rate,and ROC curve).
文摘The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.
文摘In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.
基金This work was financially supported by National Natural Science Foundation of China(41972262)Hebei Natural Science Foundation for Excellent Young Scholars(D2020504032)+1 种基金Central Plains Science and technology innovation leader Project(214200510030)Key research and development Project of Henan province(221111321500).
文摘Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.
文摘Amongst several biometric traits,Vein pattern biometric has drawn much attention among researchers and diverse users.It gains its importance due to its difficulty in reproduction and inherent security advantages.Many research papers have dealt with the topic of new generation biometric solutions such as iris and vein biometrics.However,most implementations have been based on small datasets due to the difficulties in obtaining samples.In this paper,a deeper study has been conducted on previously suggested methods based on Convolutional Neural Networks(CNN)using a larger dataset.Also,modifications are suggested for implementation using ensemble methods.Ensembles were used to reduce training time and cost by training multiple weak classifiers instead of a single,strong classifier.Classifiers used were CNN,Random Forest and Logistic Regression.An inexpensive and robust data acquisition system was also developed for obtaining the dataset.The obtained result shows an improved accuracy of 96.77%using ensemble method instead of dealing with a single classifier.
基金the financial support of this work from the National Natural Science Foundation of China(Grant No.11972073,Grant No.51974357,and Grant No.52274027)supported by China Postdoctoral Science Foundation(Grant No.2022M713204)Scientific Research and Technology Development Project of China National Petroleum Corporation(Grant No.2121DJ2301).
文摘Production optimization is of significance for carbonate reservoirs,directly affecting the sustainability and profitability of reservoir development.Traditional physics-based numerical simulations suffer from insufficient calculation accuracy and excessive time consumption when performing production optimization.We establish an ensemble proxy-model-assisted optimization framework combining the Bayesian random forest(BRF)with the particle swarm optimization algorithm(PSO).The BRF method is implemented to construct a proxy model of the injectioneproduction system that can accurately predict the dynamic parameters of producers based on injection data and production measures.With the help of proxy model,PSO is applied to search the optimal injection pattern integrating Pareto front analysis.After experimental testing,the proxy model not only boasts higher prediction accuracy compared to deep learning,but it also requires 8 times less time for training.In addition,the injection mode adjusted by the PSO algorithm can effectively reduce the gaseoil ratio and increase the oil production by more than 10% for carbonate reservoirs.The proposed proxy-model-assisted optimization protocol brings new perspectives on the multi-objective optimization problems in the petroleum industry,which can provide more options for the project decision-makers to balance the oil production and the gaseoil ratio considering physical and operational constraints.
文摘This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classification or prediction by the RF classifier.The preprocessed data is normalized using minmax normalization often used before modelfitting.As the input data or variables are measured at different scales,it is necessary to normalize them to contribute equally to the modelfitting.Then,the RF classifier is employed for course selection which is an ensemble learning method and k-fold cross-validation(k=10)is used to validate the model.The proposed Prediction Model for Course Selection(PMCS)system is considered a multi-class problem that predicts the course for a particular learner with three complexity levels,namely low,medium and high.It is operated under two modes;locally and globally.The former considers the gender of the learner and the later does not consider the gender of the learner.The database comprises the learner opinions from 75 males and 75 females per category(low,medium and high).Thus the system uses a total of 450 samples to evaluate the performance of the PMCS system.Results show that the system’s performance,while using locally i.e.,gender-wise has slightly higher performance than the global system.The RF classifier with 75 decision trees in the global system provides an average accuracy of 97.6%,whereas in the local system it is 97%(male)and 97.6%(female).The overall performance of the RF classifier with 75 trees is better than 25,50 and 100 decision trees in both local and global systems.
文摘The flowering forecast provides recommendations for orchard cleaning, pest control, field management and fertilization, which can help increase tree vigor and resistance. Flowering forecast is not only an important part of the construction of agro-meteorological index system, but also an important part of the meteorological service system. In this paper, by analyzing local meteorological data and phenological data of “Red Fuji” apples in Fen County, Linfen City, Shanxi Province, with the help of machine learning and neural networks, we proposed a method based on the combination of time series forecasting and classification forecasting is proposed to complete the dynamic forecasting model of local flowering in Ji County. Then, we evaluated the effectiveness of the model based on the number of error days and the number of days in advance. The implementation shows that the proposed multivariable LSTM network has a good effect on the prediction of meteorological factors. The model loss is less than 0.2. In the two-category task of flowering judgment, the idea of combining strategies in ensemble learning improves the effect of flowering judgment, and its AUC value increases from 0.81 and 0.80 of single model RF and AdaBoost to 0.82. The proposed model has high applicability and accuracy for flowering forecast. At the same time, the model solves the problem of rounding decimals in the prediction of flowering dates by the regression method.
文摘Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.
文摘Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanizedcountries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples isbecoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions.The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the proposed work, theiterative weighted map reduce framework is introduced for the effective management of large dataset samples. A binary classification problem is formulated usingensemble nonlinear support vector machines and random forests. Thus, instead ofusing the normal linear combination of kernel activations, the proposed work creates nonlinear combinations of kernel activations in prototype examples. Furthermore, different descriptors are combined in an ensemble of deep support vectormachines, where the product rule is used to combine probability estimates ofdifferent classifiers. Performance is evaluated in terms of the prediction accuracyand interpretability of the model and the results.
文摘Difficulty in communicating and interacting with other people are mainly due to the neurological disorder called autism spectrum disorder(ASD)diseases.These diseases can affect the nerves at any stage of the human being in childhood,adolescence,and adulthood.ASD is known as a behavioral disease due to the appearances of symptoms over thefirst two years that continue until adulthood.Most of the studies prove that the early detection of ASD helps improve the behavioral characteristics of patients with ASD.The detection of ASD is a very challenging task among various researchers.Machine learning(ML)algorithms still act very intelligent by learning the complex data and pre-dicting quality results.In this paper,ensemble ML techniques for the early detec-tion of ASD are proposed.In this detection,the dataset isfirst processed using three ML algorithms such as sequential minimal optimization with support vector machine,Kohonen self-organizing neural network,and random forest algorithm.The prediction results of these ML algorithms(ensemble)further use the bagging concept called max voting to predict thefinal result.The accuracy,sensitivity,and specificity of the proposed system are calculated using confusion matrix.The pro-posed ensemble technique performs better than state-of-the art ML algorithms.
文摘Diabetes is a hereditary disorder that interferes with human life at all ages.It is challenging for cells to absorb glucose from the bloodstream when an individual has diabetes.The two main subtypes of diabetes are type 1 diabetes and type 2 diabetes.Type 1 diabetes develops when the pancreas cannot make enough insulin,whereas type 2 diabetes spreads due to insulin resistance.Diabetes is a recurrent,and chronic illness that is incurable.In modern healthcare systems,disease detection technology is pervasive.Detecting diabetes in its early stages is crucial for initiating timely treatment and halting disease progression.The proposed method has the potential not only to forecast the likelihood of future diabetes onset but also to identify the specific type of diabetes a person may develop.This paper investigates a potential solution for a diabetes prediction model in light of the continually rising prevalence of diabetes among patients.The proposed framework is designed using two datasets:the Pima Indian dataset,which is used to forecast diabetes,and the DiabetesType dataset,which is used to identify the type of diabetes mellitus an individual has.This research aims to apply machine learning classifiers and ensemble models,such as Bagging,Voting,Averaging,and Stacking,for diabetes prediction.In this context,SMOTE(synthetic minority oversampling technique)and hyperparameter adjustment of the algorithms are considered and have substantially improved the findings.The developed heterogeneous ensemble model offers enhanced prediction rates with different performance criteria.Using the bagging technique,random forest attains a 96%accuracy rate,resulting in better predictions in the PID dataset.Regarding the DiabetesType dataset,the voting ensemble model provides a 98.5%accuracy rate.This study highlights that ensemble learning models are effective in predicting diabetes and can outperform earlier relevant studies.
基金supported by the National Natural Science Foundation of China under Grant(Number:52105136)the Hong Kong Scholar program under Grant(Number:XJ2022013)China Postdoctoral Science Foundation under Grant(Number:2021M690290)Academic Excellence Foundation of BUAA under Grant(Number:BY2004103).
文摘Fatigue reliability-based design optimization of aeroengine structures involves multiple repeated calculations of reliability degree and large-scale calls of implicit high-nonlinearity limit state function,leading to the traditional direct Monte Claro and surrogate methods prone to unacceptable computing efficiency and accuracy.In this case,by fusing the random subspace strategy and weight allocation technology into bagging ensemble theory,a random forest(RF)model is presented to enhance the computing efficiency of reliability degree;moreover,by embedding the RF model into multilevel optimization model,an efficient RF-assisted fatigue reliability-based design optimization framework is developed.Regarding the low-cycle fatigue reliability-based design optimization of aeroengine turbine disc as a case,the effectiveness of the presented framework is validated.The reliabilitybased design optimization results exhibit that the proposed framework holds high computing accuracy and computing efficiency.The current efforts shed a light on the theory/method development of reliability-based design optimization of complex engineering structures.
基金National Natural Science Foundation of China,Grant/Award Numbers:61673084,National Natural Science Foundation of ChinaThe Fundamental Research Foundation for Universities of Heilongjiang Province,Grant/Award Number:LGYC2018JC017。
文摘As a complex hot problem in the financial field,stock trend forecasting uses a large amount of data and many related indicators;hence it is difficult to obtain sustainable and effective results only by relying on empirical analysis.Researchers in the field of machine learning have proved that random forest can form better judgements on this kind of problem,and it has an auxiliary role in the prediction of stock trend.This study uses historical trading data of four listed companies in the USA stock market,and the purpose of this study is to improve the performance of random forest model in medium-and long-term stock trend prediction.This study applies the exponential smoothing method to process the initial data,calculates the relevant technical indicators as the characteristics to be selected,and proposes the D-RF-RS method to optimize random forest.As the random forest is an ensemble learning model and is closely related to decision tree,D-RF-RS method uses a decision tree to screen the importance of features,and obtains the effective strong feature set of the model as input.Then,the parameter combination of the model is optimized through random parameter search.The experimental results show that the average accuracy of random forest is increased by 0.17 after the above process optimization,which is 0.18 higher than the average accuracy of light gradient boosting machine model.Combined with the performance of the ROC curve and Precision–Recall curve,the stability of the model is also guaranteed,which further demonstrates the advantages of random forest in medium-and long-term trend prediction of the stock market.