期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
A Two-Step Algorithm to Estimate Variable Importance for Multi-State Data:An Application to COVID-19
1
作者 Behnaz Alafchi Leili Tapak +2 位作者 Hassan Doosti Christophe Chesneau Ghodratollah Roshanaei 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第6期2047-2064,共18页
Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences... Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events.One main objective in the MSM framework is variable selection,where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression.The usual variable selection methods,including stepwise and penalized methods,do not provide information about the importance of variables.In this context,we present a two-step algorithm to evaluate the importance of variables formulti-state data.Three differentmachine learning approaches(randomforest,gradient boosting,and neural network)as themost widely usedmethods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance.The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set.The results revealed that the proposed two-stage method has promising performance for estimating variable importance. 展开更多
关键词 Multi-state data deviance residual martingale residual gradient boosting randomforest neural network variable importance variable selection
下载PDF
Variable Importance Measure System Based on Advanced Random Forest
2
作者 Shufang Song Ruyang He +1 位作者 Zhaoyin Shi Weiya Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第7期65-85,共21页
The variable importance measure(VIM)can be implemented to rank or select important variables,which can effectively reduce the variable dimension and shorten the computational time.Random forest(RF)is an ensemble learn... The variable importance measure(VIM)can be implemented to rank or select important variables,which can effectively reduce the variable dimension and shorten the computational time.Random forest(RF)is an ensemble learning method by constructing multiple decision trees.In order to improve the prediction accuracy of random forest,advanced random forest is presented by using Kriging models as the models of leaf nodes in all the decision trees.Referring to the Mean Decrease Accuracy(MDA)index based on Out-of-Bag(OOB)data,the single variable,group variables and correlated variables importance measures are proposed to establish a complete VIM system on the basis of advanced random forest.The link of MDA and variance-based sensitivity total index is explored,and then the corresponding relationship of proposed VIM indices and variance-based global sensitivity indices are constructed,which gives a novel way to solve variance-based global sensitivity.Finally,several numerical and engineering examples are given to verify the effectiveness of proposed VIM system and the validity of the established relationship. 展开更多
关键词 Variable importance measure random forest variance-based global sensitivity Kriging model
下载PDF
Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin,Asir Region,Saudi Arabia 被引量:9
3
作者 Ahmed Mohamed Youssef Hamid Reza Pourghasemi 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第2期639-655,共17页
The current study aimed at evaluating the capabilities of seven advanced machine learning techniques(MLTs),including,Support Vector Machine(SVM),Random Forest(RF),Multivariate Adaptive Regression Spline(MARS),Artifici... The current study aimed at evaluating the capabilities of seven advanced machine learning techniques(MLTs),including,Support Vector Machine(SVM),Random Forest(RF),Multivariate Adaptive Regression Spline(MARS),Artificial Neural Network(ANN),Quadratic Discriminant Analysis(QDA),Linear Discriminant Analysis(LDA),and Naive Bayes(NB),for landslide susceptibility modeling and comparison of their performances.Coupling machine learning algorithms with spatial data types for landslide susceptibility mapping is a vitally important issue.This study was carried out using GIS and R open source software at Abha Basin,Asir Region,Saudi Arabia.First,a total of 243 landslide locations were identified at Abha Basin to prepare the landslide inventory map using different data sources.All the landslide areas were randomly separated into two groups with a ratio of 70%for training and 30%for validating purposes.Twelve landslide-variables were generated for landslide susceptibility modeling,which include altitude,lithology,distance to faults,normalized difference vegetation index(NDVI),landuse/landcover(LULC),distance to roads,slope angle,distance to streams,profile curvature,plan curvature,slope length(LS),and slope-aspect.The area under curve(AUC-ROC)approach has been applied to evaluate,validate,and compare the MLTs performance.The results indicated that AUC values for seven MLTs range from 89.0%for QDA to 95.1%for RF.Our findings showed that the RF(AUC=95.1%)and LDA(AUC=941.7%)have produced the best performances in comparison to other MLTs.The outcome of this study and the landslide susceptibility maps would be useful for environmental protection. 展开更多
关键词 Landslide susceptibility Machine learning algorithms variables importance Saudi Arabia
下载PDF
Improved prediction of slope stability using a hybrid stacking ensemble method based on finite element analysis and field data 被引量:12
4
作者 Navid Kardani Annan Zhou +1 位作者 Majidreza Nazem Shui-Long Shen 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2021年第1期188-201,共14页
Slope failures lead to catastrophic consequences in numerous countries and thus the stability assessment for slopes is of high interest in geotechnical and geological engineering researches.A hybrid stacking ensemble ... Slope failures lead to catastrophic consequences in numerous countries and thus the stability assessment for slopes is of high interest in geotechnical and geological engineering researches.A hybrid stacking ensemble approach is proposed in this study for enhancing the prediction of slope stability.In the hybrid stacking ensemble approach,we used an artificial bee colony(ABC)algorithm to find out the best combination of base classifiers(level 0)and determined a suitable meta-classifier(level 1)from a pool of 11 individual optimized machine learning(OML)algorithms.Finite element analysis(FEA)was conducted in order to form the synthetic database for the training stage(150 cases)of the proposed model while 107 real field slope cases were used for the testing stage.The results by the hybrid stacking ensemble approach were then compared with that obtained by the 11 individual OML methods using confusion matrix,F1-score,and area under the curve,i.e.AUC-score.The comparisons showed that a significant improvement in the prediction ability of slope stability has been achieved by the hybrid stacking ensemble(AUC?90.4%),which is 7%higher than the best of the 11 individual OML methods(AUC?82.9%).Then,a further comparison was undertaken between the hybrid stacking ensemble method and basic ensemble classifier on slope stability prediction.The results showed a prominent performance of the hybrid stacking ensemble method over the basic ensemble method.Finally,the importance of the variables for slope stability was studied using linear vector quantization(LVQ)method. 展开更多
关键词 Slope stability Machine learning(ML) Stacking ensemble Variable importance Artificial bee colony(ABC)
下载PDF
Gully erosion spatial modelling: Role of machine learning algorithms in selection of the best controlling factors and modelling process 被引量:4
5
作者 Hamid Reza Pourghasemi Nitheshnirmal Sadhasivam +1 位作者 Narges Kariminejad Adrian L.Collins 《Geoscience Frontiers》 SCIE CAS CSCD 2020年第6期2207-2219,共13页
This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linea... This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linear model(SGLM),elastic net(ENET),partial least square(PLS),ridge regression,support vector machine(SVM),classification and regression trees(CART),bagged CART,and random forest(RF)for gully erosion susceptibility mapping(GESM)in Iran.The location of 462 previously existing gully erosion sites were mapped through widespread field investigations,of which 70%(323)and 30%(139)of observations were arbitrarily divided for algorithm calibration and validation.Twelve controlling factors for gully erosion,namely,soil texture,annual mean rainfall,digital elevation model(DEM),drainage density,slope,lithology,topographic wetness index(TWI),distance from rivers,aspect,distance from roads,plan curvature,and profile curvature were ranked in terms of their importance using each MLA.The MLA were compared using a training dataset for gully erosion and statistical measures such as RMSE(root mean square error),MAE(mean absolute error),and R-squared.Based on the comparisons among MLA,the RF algorithm exhibited the minimum RMSE and MAE and the maximum value of R-squared,and was therefore selected as the best model.The variable importance evaluation using the RF model revealed that distance from rivers had the highest significance in influencing the occurrence of gully erosion whereas plan curvature had the least importance.According to the GESM generated using RF,most of the study area is predicted to have a low(53.72%)or moderate(29.65%)susceptibility to gully erosion,whereas only a small area is identified to have a high(12.56%)or very high(4.07%)susceptibility.The outcome generated by RF model is validated using the ROC(Receiver Operating Characteristics)curve approach,which returned an area under the curve(AUC)of 0.985,proving the excellent forecasting ability of the model.The GESM prepared using the RF algorithm can aid decision-makers in targeting remedial actions for minimizing the damage caused by gully erosion. 展开更多
关键词 Machine learning algorithm Gully erosion Random forest Controlling factors Variable importance
下载PDF
A neural network-based production process modeling and variable importance analysis approach in corn to sugar factory 被引量:1
6
作者 Yi Tong Mou Shu +10 位作者 Mingxin Li Yingwei Liu Ran Tao Congcong Zhou You Zhao Guoxing Zhao Yi Li Yachao Dong Lei Zhang Linlin Liu Jian Du 《Frontiers of Chemical Science and Engineering》 SCIE EI CSCD 2023年第3期358-371,共14页
Corn to sugar process has long faced the risks of high energy consumption and thin profits.However,it’s hard to upgrade or optimize the process based on mechanism unit operation models due to the high complexity of t... Corn to sugar process has long faced the risks of high energy consumption and thin profits.However,it’s hard to upgrade or optimize the process based on mechanism unit operation models due to the high complexity of the related processes.Big data technology provides a promising solution as its ability to turn huge amounts of data into insights for operational decisions.In this paper,a neural network-based production process modeling and variable importance analysis approach is proposed for corn to sugar processes,which contains data preprocessing,dimensionality reduction,multilayer perceptron/convolutional neural network/recurrent neural network based modeling and extended weights connection method.In the established model,dextrose equivalent value is selected as the output,and 654 sites from the DCS system are selected as the inputs.LASSO analysis is first applied to reduce the data dimension to 155,then the inputs are dimensionalized to 50 by means of genetic algorithm optimization.Ultimately,variable importance analysis is carried out by the extended weight connection method,and 20 of the most important sites are selected for each neural network.The results indicate that the multilayer perceptron and recurrent neural network models have a relative error of less than 0.1%,which have a better prediction result than other models,and the 20 most important sites selected have better explicable performance.The major contributions derived from this work are of significant aid in process simulation model with high accuracy and process optimization based on the selected most important sites to maintain high quality and stable production for corn to sugar processes. 展开更多
关键词 big data corn to sugar factory neural network variable importance analysis
原文传递
Digital mapping of soil phosphorous sorption parameters (PSPs) using environmental variables and machine learning algorithms
7
作者 Sanaz Saidi Shamsollah Ayoubi +2 位作者 Mehran Shirvani Kamran Azizi Shuai Zhao 《International Journal of Digital Earth》 SCIE EI 2023年第1期1752-1769,共18页
In this study some soil phosphorous sorption parameters(PSPs)by using different machine learning models(Cubist(Cu),random forest(RF),support vector machines(SVM)and Gaussian process regression(GPR))were predicted.The ... In this study some soil phosphorous sorption parameters(PSPs)by using different machine learning models(Cubist(Cu),random forest(RF),support vector machines(SVM)and Gaussian process regression(GPR))were predicted.The results showed that using the topographic attributes as the sole auxiliary variables was not adequate for predicting the PSPs.However,remote sensing data and its combination with soil properties were reliably used to predict PSPs(R^(2)=0.41 for MBC by RF model,R^(2)=0.49 for PBC by Cu model,R^(2)=0.37 for SPR by Cu model,and R^(2)=0.38 for SBC by RF model).The lowest RMSE values were obtained for MBC by RF model,PBC by SVM model,SPR by Cubist model and SBC by RF model.The results also showed that remote sensing data as the easily available datasets could reliably predict PSPs in the given study area.The outcomes of variable importance analysis revealed that among the soil properties cation exchange capacity(CEC)and clay content,and among the remote sensing indices B5/B7,Midindex,Coloration index,Saturation index,and OSAVI were the most imperative factors for predicting PSPs.Further studies are recommended to use other proximally sensed data to improve PSPs prediction to precise decision-making throughout the landscape. 展开更多
关键词 Soil fertility random forest adsorption isotherms remote sensing variable importance analysis
原文传递
Machine Learning and Regression Analysis Reveal Different Patterns of Influence on Net Ecosystem Exchange at Two Conifer Woodland Sites
8
作者 David A.Wood 《Research in Ecology》 2022年第2期24-50,共27页
Variations in net ecosystem exchange(NEE)of carbon dioxide,and the variables influencing it,at woodland sites over multiple years determine the long term performance of those sites as carbon sinks.In this study,weekly... Variations in net ecosystem exchange(NEE)of carbon dioxide,and the variables influencing it,at woodland sites over multiple years determine the long term performance of those sites as carbon sinks.In this study,weekly-averaged data from two AmeriFlux sites in North America of evergreen woodland,in different climatic zones and with distinct tree and understory species,are evaluated using four multi-linear regression(MLR)and seven machine learning(ML)models.The site data extend over multiple years and conform to the FLUXNET2015 pre-processing pipeline.Twenty influencing variables are considered for site CA-LP1 and sixteen for site US-Mpj.Rigorous k-fold cross validation analysis verifies that all eleven models assessed generate reproducible NEE predictions to varying degrees of accuracy.At both sites,the best performing ML models(support vector regression(SVR),extreme gradient boosting(XGB)and multi-layer perceptron(MLP))substantially outperform the MLR models in terms of their NEE prediction performance.The ML models also generate predicted versus measured NEE distributions that approximate cross-plot trends passing through the origin,confirming that they more realistically capture the actual NEE trend.MLR and ML models assign some level of importance to all influential variables measured but their degree of influence varies between the two sites.For the best performing SVR models,at site CA-LP1,variables air temperature,shortwave radiation outgoing,net radiation,longwave radiation outgoing,shortwave radiation incoming and vapor pressure deficit have the most influence on NEE predictions.At site US-Mpj,variables vapor pressure deficit,shortwave radiation incoming,longwave radiation incoming,air temperature,photosynthetic photon flux density incoming,shortwave radiation outgoing and precipitation exert the most influence on the model solutions.Sensible heat exerts very low influence at both sites.The methodology applied successfully determines the relative importance of influential variables in determining weekly NEE trends at both conifer woodland sites studied. 展开更多
关键词 Eddy covariance FLUXNET2015 Weekly NEE trends Variable importance Correlation comparisons NEE prediction
下载PDF
Variable importance-weighted Random Forests 被引量:3
9
作者 Yiyi Liu Hongyu Zhao 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2017年第4期338-351,共14页
Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number... Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest. Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features. Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases. Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package "viRandomForests" based on the original R package "randomForest" and it can be freely downloaded from http:// zhaocenter.org/software. 展开更多
关键词 Random Forests variable importance score CLASSIFICATION regression
原文传递
Energy and carbon performance of urban buildings using metamodeling variable importance techniques 被引量:1
10
作者 Yunliang Liu Wei Tian Xiang Zhou 《Building Simulation》 SCIE EI CSCD 2021年第3期535-547,共13页
Global urbanization causes more environmental stresses in cities and energy efficiency is one of major concerns for urban sustainability.The variable importance techniques have been widely used in building energy anal... Global urbanization causes more environmental stresses in cities and energy efficiency is one of major concerns for urban sustainability.The variable importance techniques have been widely used in building energy analysis to determine key factors influencing building energy use.Most of these applications,however,use only one type of variable importance approaches.Therefore,this paper proposes a procedure of conducting two types of variable importance analysis(predictive and variance-based)to determine robust and effective energy saving measures in urban buildings.These two variable importance methods belong to metamodeling techniques,which can significantly reduce computational cost of building energy simulation models for urban buildings.The predictive importance analysis is based on the prediction errors of metamodels to obtain importance rankings of inputs,while the variance-based variable importance can explore non-linear effects and interactions among input variables based on variance decomposition.The campus buildings are used to demonstrate the application of the method proposed to explore characteristic of heating energy,cooling energy,electricity,and carbon emissions of buildings.The results indicate that the combination of two types of metamodeling variable importance analysis can provide fast and robust analysis to improve energy efficiency of urban buildings.The carbon emissions can be reduced approximately 30%after using a few of effective energy efficiency measures and more aggressive measures can lead to the 60%of reduction of carbon emissions.Moreover,this research demonstrates the application of parallel computing to expedite building energy analysis in urban environment since more multi-core computers become increasingly available. 展开更多
关键词 urban buildings variable importance METAMODELING energy performance carbon emissions
原文传递
Wildfire Susceptibility Assessment in Southern China:A Comparison of Multiple Methods 被引量:3
11
作者 Yinxue Cao Ming Wang Kai Liu 《International Journal of Disaster Risk Science》 SCIE CSCD 2017年第2期164-181,共18页
Wildfire is a primary forest disturbance.A better understanding of wildfire susceptibility and its dominant influencing factors is crucial for regional wildfire risk management.This study performed a wildfire suscepti... Wildfire is a primary forest disturbance.A better understanding of wildfire susceptibility and its dominant influencing factors is crucial for regional wildfire risk management.This study performed a wildfire susceptibility assessment using multiple methods,including logistic regression,probit regression,an artificial neural network,and a random forest(RF) algorithm.Yunnan Province,China was used as a case study area.We investigated the sample ratio of ignition and nonignition data to avoid misleading results due to the overwhelming number of nonignition samples in the models.To compare model performance and the importance of variables among the models,the area under the curve of the receiver operating characteristic plot was used as an indicator.The results show that a cost-sensitive RF had the highest accuracy(88.47%) for all samples,and 94.23% accuracy for ignition prediction.The identified main factors that influence Yunnan wildfire occurrence were forest coverage ratio,month,season,surface roughness,10 days minimum of the 6 h maximum humidity,and 10 days maxima of the 6 h average and maximum temperatures.These seven variables made the greatest contributions to regional wildfire susceptibility.Susceptibility maps developed from the models provide information regarding the spatial variation of ignition susceptibility,which can be used in regional wildfire risk management. 展开更多
关键词 China Random forest Variable importance rank Wildfire susceptibility Yunnan forest
原文传递
Energy characteristics of urban buildings: Assessment by machine learning 被引量:3
12
作者 Wei Tian Chuanqi Zhu +2 位作者 Yu Sun Zhanyong Li Baoquan Yin 《Building Simulation》 SCIE EI CSCD 2021年第1期179-193,共15页
Machine learning techniques have attracted more attention as advanced data analytics in building energy analysis.However,most of previous studies are only focused on the prediction capability of machine learning algor... Machine learning techniques have attracted more attention as advanced data analytics in building energy analysis.However,most of previous studies are only focused on the prediction capability of machine learning algorithms to provide reliable energy estimation in buildings.Machine learning also has great potentials to identify energy patterns for urban buildings except for model prediction.Therefore,this paper explores energy characteristic of London domestic properties using ten machine learning algorithms from three aspects:tuning process of learning model;variable importance;spatial analysis of model discrepancy.The results indicate that the combination of these three aspects can provide insights on energy patterns for urban buildings.The tuning process of these models indicates that gas use models should have more terms in comparison with electricity in London and the interaction terms should be considered in both gas and electricity models.The rankings of important variables are very different for gas and electricity prediction in London residential buildings,which suggests that gas and electricity use are affected by different physical and social factors.Moreover,the importance levels for these key variables are markedly different for gas and electricity consumption.There are much more important variables for electricity use in comparison with gas use for the importance levels over 40.The areas with larger model discrepancies can be determined using the local spatial analysis based on these machine learning models.These identified areas have significantly different energy patterns for gas and electricity use.More research is required to understand these unusual patterns of energy use in these areas. 展开更多
关键词 urban buildings energy characteristics machine learning variable importance spatial analysis
原文传递
A two-stage variable selection strategy for supersaturated designs with multiple responses 被引量:1
13
作者 Yuhui YIN Qiaozhen ZHANG Min-Qian LIU 《Frontiers of Mathematics in China》 SCIE CSCD 2013年第3期717-730,共14页
A supersaturated design (SSD), whose run size is not enough for estimating all the main effects, is commonly used in screening experiments. It offers a potential useful tool to investigate a large number of factors ... A supersaturated design (SSD), whose run size is not enough for estimating all the main effects, is commonly used in screening experiments. It offers a potential useful tool to investigate a large number of factors with only a few experimental runs. The associated analysis methods have been proposed by many authors to identify active effects in situations where only one response is considered. However, there are often situations where two or more responses are observed simultaneously in one screening experiment, and the analysis of SSDs with multiple responses is thus needed. In this paper, we propose a two-stage variable selection strategy, called the multivariate partial least squares-stepwise regression (MPLS-SR) method, which uses the multivariate partial least squares regression in conjunction with the stepwise regression procedure to select true active effects in SSDs with multiple responses. Simulation studies show that the MPLS-SR method performs pretty good and is easy to understand and implement. 展开更多
关键词 Multivariate partial least squares (MPLS) supersaturated design(SSD) stepwise regression variable selection variable importance in projection
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部