Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species divers...Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.展开更多
According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model u...In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.展开更多
The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more ...The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.展开更多
Urban grid power forecasting is one of the important tasks of power system operators, which helps to analyze the development trend of the city. As the demand for electricity in various industries is affected by many f...Urban grid power forecasting is one of the important tasks of power system operators, which helps to analyze the development trend of the city. As the demand for electricity in various industries is affected by many factors, the data of relevant influencing factors are scarce, resulting in great deviations in the accuracy of prediction results. In order to improve the prediction results, this paper proposes a model based on Multi-Target Tree Regression to predict the monthly electricity consumption of different industrial structures. Due to few data characteristics of actual electricity consumption in Shanghai from 2013 to the first half of 2017. Thus, we collect data on GDP growth, weather conditions, and tourism season distribution in various industries in Shanghai, model and train the electricity consumption data of different industries in different months. The multi-target tree regression model was tested with actual values to verify the reliability of the model and predict the monthly electricity consumption of each industry in the second half of 2017. The experimental results show that the model can accurately predict the monthly electricity consumption of various industries.展开更多
Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.It has received relatively small attention from the Machine Learni...Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.It has received relatively small attention from the Machine Learning community.However,multi-target regression exists in many real-world applications.In this paper we conduct extensive experiments to investigate the performance of three representative multi-target regression learning algorithms(i.e.Multi-Target Stacking(MTS),Random Linear Target Combination(RLTC),and Multi-Objective Random Forest(MORF)),comparing the baseline single-target learning.Our experimental results show that all three multi-target regression learning algorithms do improve the performance of the single-target learning.Among them,MTS performs the best,followed by RLTC,followed by MORF.However,the single-target learning sometimes still performs very well,even the best.This analysis sheds the light on multi-target regression learning and indicates that the single-target learning is a competitive baseline for multi-target regression learning on multi-target domains.展开更多
Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ ...Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.展开更多
Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Re...Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.展开更多
The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth res...The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth response, tree age, pruning severity and pretreatment crown size. First, multiple regression analysis was performed to assess the effect of tree age, pruning severity and pretreatment crown size on diameter growth response. Next, segmented regression analysis was performed to assess the effect of pruning severity on diameter growth response. The results of the multiple regression showed that diameter growth response was significantly influenced by pruning severity and pretreatment crown size. The results of the segmented regression showed that in the whole data set, an abrupt change toward a decrease in diameter growth response was detected at 25% of the live crown removed. However, in the group of fully crowned and open-grown, diameter growth response continuously decreased with increasing pruning severity with no significant abrupt change, whereas in the group of 70% - 90% live crown, diameter growth response did not significantly decrease up to the break point (53% crown removed) and then abruptly decreased. This may be the first study to show the numerical evaluation of the effect of pruning severity on tree growth by change point analysis.展开更多
A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities...A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities points simultaneously. It is also easy to be used to solve the continuous function's regression problems. The numerical experiment results demonstrate that the point-tree GP is an efficient alternative way to the complex function identification problems.展开更多
Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head b...Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.展开更多
基金financially supported by National Key R&D Program of China(2021YFD220040403 and 2021YFD220040304)the China Scholarship Council(202107565021).
文摘Background: Vegetation distribution maps are of great significance for nature protection and management. In diverse tropical forests, accurate spatial mapping of vegetation types is challenging;the high species diversity and abundance of rare species challenge classification concepts, while remote sensing signals may not vary systematically with species composition, complicating the technical capability for delineating vegetation types in the landscape.Methods: We used a combination of field-based compositional data and their relations to environmental variables to predict the distribution of forest types in the Wuzhishan National Natural Reserve(WNNR), Hainan Island,China, using multivariate regression trees(MRT). The MRT was based on arboreal vegetation composition in 132plots of 20 m×20 m with a regular spacing of 1 km. Apart from the MRT, non-metric multidimensional scaling(NMDS) was used to evaluate vegetation-environment relationships.Results: The MRT model worked best when using 14 key environmental variables including topography, climate,latitude and soil, although the difference with the simpler model including only topographical variables was small. The full model classified the 132 plots into 3 vegetation types, 6 formation groups, 20 formations and 65associations at different hierarchical syntaxonomic levels. This model was the basis for forest vegetation maps for the WNNR. MRT and NMDS showed that elevation was the main driving force for the distribution of vegetation types and formation groups. Climate, latitude, and soil(especially available P), together with topographic variables, all influenced the distribution of formations and associations.Conclusions: While elevation determines forest-type distributions, lower-level syntaxonomic forest classes respond to the topographic diversity typical for mountains. Apart from providing the first detailed forest vegetation map for any part of WNNR, we show how, in spite of limitations, MRT with existing environmental data can be a useful method for mapping diverse and remote tropical forests.
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金The National Natural Science Foundation of China(No.51708110)。
文摘In order to solve the poor generalization ability of the back-propagation(BP)neural network in the model updating hybrid test,a novel method called the AdaBoost regression tree algorithm is introduced into the model updating procedure in hybrid tests.During the learning phase,the regression tree is selected as a weak regression model to be trained,and then multiple trained weak regression models are integrated into a strong regression model.Finally,the training results are generated through voting by all the selected regression models.A 2-DOF nonlinear structure was numerically simulated by utilizing the online AdaBoost regression tree algorithm and the BP neural network algorithm as a contrast.The results show that the prediction accuracy of the online AdaBoost regression algorithm is 48.3%higher than that of the BP neural network algorithm,which verifies that the online AdaBoost regression tree algorithm has better generalization ability compared to the BP neural network algorithm.Furthermore,it can effectively eliminate the influence of weight initialization and improve the prediction accuracy of the restoring force in hybrid tests.
文摘The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.
文摘Urban grid power forecasting is one of the important tasks of power system operators, which helps to analyze the development trend of the city. As the demand for electricity in various industries is affected by many factors, the data of relevant influencing factors are scarce, resulting in great deviations in the accuracy of prediction results. In order to improve the prediction results, this paper proposes a model based on Multi-Target Tree Regression to predict the monthly electricity consumption of different industrial structures. Due to few data characteristics of actual electricity consumption in Shanghai from 2013 to the first half of 2017. Thus, we collect data on GDP growth, weather conditions, and tourism season distribution in various industries in Shanghai, model and train the electricity consumption data of different industries in different months. The multi-target tree regression model was tested with actual values to verify the reliability of the model and predict the monthly electricity consumption of each industry in the second half of 2017. The experimental results show that the model can accurately predict the monthly electricity consumption of various industries.
基金This research has been supported by the US National Science Foundation under grant IIS-1115417the National Natural Science Foundation of China under grant 61728205,61472267and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant SZS201609。
文摘Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables.It has received relatively small attention from the Machine Learning community.However,multi-target regression exists in many real-world applications.In this paper we conduct extensive experiments to investigate the performance of three representative multi-target regression learning algorithms(i.e.Multi-Target Stacking(MTS),Random Linear Target Combination(RLTC),and Multi-Objective Random Forest(MORF)),comparing the baseline single-target learning.Our experimental results show that all three multi-target regression learning algorithms do improve the performance of the single-target learning.Among them,MTS performs the best,followed by RLTC,followed by MORF.However,the single-target learning sometimes still performs very well,even the best.This analysis sheds the light on multi-target regression learning and indicates that the single-target learning is a competitive baseline for multi-target regression learning on multi-target domains.
文摘Understanding an underlying structure for phylogenetic trees is very important as it informs on the methods that should be employed during phylogenetic inference. The methods used under a structured population differ from those needed when a population is not structured. In this paper, we compared two supervised machine learning techniques, that is artificial neural network (ANN) and logistic regression models for prediction of an underlying structure for phylogenetic trees. We carried out parameter tuning for the models to identify optimal models. We then performed 10-fold cross-validation on the optimal models for both logistic regression?and ANN. We also performed a non-supervised technique called clustering to identify the number of clusters that could be identified from simulated phylogenetic trees. The trees were from?both structured?and non-structured populations. Clustering and prediction using classification techniques were?done using tree statistics such as Colless, Sackin and cophenetic indices, among others. Results from 10-fold cross-validation revealed that both logistic regression and ANN models had comparable results, with both models having average accuracy rates of over 0.75. Most of the clustering indices used resulted in 2 or 3 as the optimal number of clusters.
文摘Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
文摘The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth response, tree age, pruning severity and pretreatment crown size. First, multiple regression analysis was performed to assess the effect of tree age, pruning severity and pretreatment crown size on diameter growth response. Next, segmented regression analysis was performed to assess the effect of pruning severity on diameter growth response. The results of the multiple regression showed that diameter growth response was significantly influenced by pruning severity and pretreatment crown size. The results of the segmented regression showed that in the whole data set, an abrupt change toward a decrease in diameter growth response was detected at 25% of the live crown removed. However, in the group of fully crowned and open-grown, diameter growth response continuously decreased with increasing pruning severity with no significant abrupt change, whereas in the group of 70% - 90% live crown, diameter growth response did not significantly decrease up to the break point (53% crown removed) and then abruptly decreased. This may be the first study to show the numerical evaluation of the effect of pruning severity on tree growth by change point analysis.
基金Supported by the National Natural Science Foundation(60173046)and the Natural Science Foundation of Province(2002AB040)
文摘A new point-tree data structure genetic programming (PTGP) method is proposed. For the discontinuous function regression problem, the proposed method is able to identify both the function structure and discontinuities points simultaneously. It is also easy to be used to solve the continuous function's regression problems. The numerical experiment results demonstrate that the point-tree GP is an efficient alternative way to the complex function identification problems.
基金supported by the National Natural Science Foundation of China(Grant No.12071173 and 12171192)Huaian Key Laboratory for Infectious Diseases Control and Prevention(HAP201704).
文摘Plant epidemics are often associated with weather-related variables.It is difficult to identify weather-related predictors for models predicting plant epidemics.In the article by Shah et al.,to predict Fusarium head blight(FHB)epidemics of wheat,they explored a functional approach using scalar-on-function regression to model a binary outcome(FHB epidemic or non-epidemic)with respect to weather time series spanning 140 days relative to anthesis.The scalar-on-function models fit the data better than previously described logistic regression models.In this work,given the same dataset and models,we attempt to reproduce the article by Shah et al.using a different approach,boosted regression trees.After fitting,the classification accuracy and model statistics are surprisingly good.