The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more ...The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.展开更多
According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the chang...According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.展开更多
The sub-pixel impervious surface percentage(SPIS) is the fraction of impervious surface area in one pixel,and it is an important indicator of urbanization.Using remote sensing data,the spatial distribution of SPIS val...The sub-pixel impervious surface percentage(SPIS) is the fraction of impervious surface area in one pixel,and it is an important indicator of urbanization.Using remote sensing data,the spatial distribution of SPIS values over large areas can be extracted,and these data are significant for studies of urban climate,environment and hydrology.To develop a stabilized,multi-temporal SPIS estimation method suitable for typical temperate semi-arid climate zones with distinct seasons,an optimal model for estimating SPIS values within Beijing Municipality was built that is based on the classification and regression tree(CART) algorithm.First,models with different input variables for SPIS estimation were built by integrating multi-source remote sensing data with other auxiliary data.The optimal model was selected through the analysis and comparison of the assessed accuracy of these models.Subsequently,multi-temporal SPIS mapping was carried out based on the optimal model.The results are as follows:1) multi-seasonal images and nighttime light(NTL) data are the optimal input variables for SPIS estimation within Beijing Municipality,where the intra-annual variability in vegetation is distinct.The different spectral characteristics in the cultivated land caused by the different farming characteristics and vegetation phenology can be detected by the multi-seasonal images effectively.NLT data can effectively reduce the misestimation caused by the spectral similarity between bare land and impervious surfaces.After testing,the SPIS modeling correlation coefficient(r) is approximately 0.86,the average error(AE) is approximately 12.8%,and the relative error(RE) is approximately 0.39.2) The SPIS results have been divided into areas with high-density impervious cover(70%–100%),medium-density impervious cover(40%–70%),low-density impervious cover(10%–40%) and natural cover(0%–10%).The SPIS model performed better in estimating values for high-density urban areas than other categories.3) Multi-temporal SPIS mapping(1991–2016) was conducted based on the optimized SPIS results for 2005.After testing,AE ranges from 12.7% to 15.2%,RE ranges from 0.39 to 0.46,and r ranges from 0.81 to 0.86.It is demonstrated that the proposed approach for estimating sub-pixel level impervious surface by integrating the CART algorithm and multi-source remote sensing data is feasible and suitable for multi-temporal SPIS mapping of areas with distinct intra-annual variability in vegetation.展开更多
This paper presents a supervised learning algorithm for retinal vascular segmentation based on classification and regression tree (CART) algorithm and improved adptive bosting (AdaBoost). Local binary patterns (LBP) t...This paper presents a supervised learning algorithm for retinal vascular segmentation based on classification and regression tree (CART) algorithm and improved adptive bosting (AdaBoost). Local binary patterns (LBP) texture features and local features are extracted by extracting,reversing,dilating and enhancing the green components of retinal images to construct a 17-dimensional feature vector. A dataset is constructed by using the feature vector and the data manually marked by the experts. The feature is used to generate CART binary tree for nodes,where CART binary tree is as the AdaBoost weak classifier,and AdaBoost is improved by adding some re-judgment functions to form a strong classifier. The proposed algorithm is simulated on the digital retinal images for vessel extraction (DRIVE). The experimental results show that the proposed algorithm has higher segmentation accuracy for blood vessels,and the result basically contains complete blood vessel details. Moreover,the segmented blood vessel tree has good connectivity,which basically reflects the distribution trend of blood vessels. Compared with the traditional AdaBoost classification algorithm and the support vector machine (SVM) based classification algorithm,the proposed algorithm has higher average accuracy and reliability index,which is similar to the segmentation results of the state-of-the-art segmentation algorithm.展开更多
<strong>Purpose of Review:</strong> The management of eye injuries is both difficult and argumentative. This study attempts to highlight the management of ocular trauma using currently available informatio...<strong>Purpose of Review:</strong> The management of eye injuries is both difficult and argumentative. This study attempts to highlight the management of ocular trauma using currently available information in the literature and author experience. This review presents a workable framework from the first presentation, epidemiology, classification, investigations, management principles, complications, prognostic factors, final visual outcome and management debates. <strong>Review Findings:</strong> Mechanical ocular trauma is a leading cause of monocular blindness and possible handicap worldwide. Among several classification systems, the most widely accepted is Birmingham Eye Trauma Terminology (BETT). Mechanical ocular trauma is a topic of unsolved controversy. Patching for corneal abrasion, paracentesis for hyphema, the timing of cataract surgery and intraocular lens implantation are all issues in anterior segment injuries. Regarding posterior segment controversies, the timing of vitrectomy, use of prophylactic cryotherapy, the necessity of intravitreal antibiotics in the absence of infection, the use of vitrectomy vs vitreous tap in traumatic endophthalmitis is the issues. The pediatric age group needs to be approached by a different protocol due to the risk of amblyopia, intraocular inflammation, and significant vitreoretinal adhesions. The various prognostic factors have a role in the final visual outcome. B scan is used to exclude R.D, Intraocular foreign body (IOFB), and vitreous haemorrhage in hazy media. Individual surgical strategies are used for every patient according to the classification and extent of the injuries. <strong>Conclusion:</strong> This article examines relevant evidence on the management challenges and controversies of mechanical trauma of the eye and offers treatment recommendations based on published research and the authors’ own experience.展开更多
BACKGROUND Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning.The global community has recently witnessed an increase in the mortality rate due to liver dise...BACKGROUND Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning.The global community has recently witnessed an increase in the mortality rate due to liver disease.This could be attributed to many factors,among which are human habits,awareness issues,poor healthcare,and late detection.To curb the growing threats from liver disease,early detection is critical to help reduce the risks and improve treatment outcome.Emerging technologies such as machine learning,as shown in this study,could be deployed to assist in enhancing its prediction and treatment.AIM To present a more efficient system for timely prediction of liver disease using a hybrid eXtreme Gradient Boosting model with hyperparameter tuning with a view to assist in early detection,diagnosis,and reduction of risks and mortality associated with the disease.METHODS The dataset used in this study consisted of 416 people with liver problems and 167 with no such history.The data were collected from the state of Andhra Pradesh,India,through https://www.kaggle.com/datasets/uciml/indian-liver-patientrecords.The population was divided into two sets depending on the disease state of the patient.This binary information was recorded in the attribute"is_patient".RESULTS The results indicated that the chi-square automated interaction detection and classification and regression trees models achieved an accuracy level of 71.36%and 73.24%,respectively,which was much better than the conventional method.The proposed solution would assist patients and physicians in tackling the problem of liver disease and ensuring that cases are detected early to prevent it from developing into cirrhosis(scarring)and to enhance the survival of patients.The study showed the potential of machine learning in health care,especially as it concerns disease prediction and monitoring.CONCLUSION This study contributed to the knowledge of machine learning application to health and to the efforts toward combating the problem of liver disease.However,relevant authorities have to invest more into machine learning research and other health technologies to maximize their potential.展开更多
Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple...Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple linear regression)are not very efficient.However,in chemometrics these methods are still not very widespread,first of all because of several limitations related to the ratio between number of variables and observations.This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification.We will try to consider all important aspects including optimization and validation of models,evaluation of results,treating missing data and selection of most important variables.The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.展开更多
Wholesale and retail markets for electricity and power require consumers to forecast electricity consumption at different time intervals. The study aims to</span><span style="font-family:Verdana;"&g...Wholesale and retail markets for electricity and power require consumers to forecast electricity consumption at different time intervals. The study aims to</span><span style="font-family:Verdana;"> increase economic efficiency of the enterprise through the introduction of algorithm for forecasting electric energy consumption unchanged in technological process. Qualitative forecast allows you to essentially reduce costs of electrical </span><span style="font-family:Verdana;">energy, because power cannot be stockpiled. Therefore, when buying excess electrical power, costs can increase either by selling it on the balancing energy </span><span style="font-family:Verdana;">market or by maintaining reserve capacity. If the purchased power is insufficient, the costs increase is due to the purchase of additional capacity. This paper illustrates three methods of forecasting electric energy consumption: autoregressive integrated moving average method, artificial neural networks and classification and regression trees. Actual data from consuming of electrical energy was </span><span style="font-family:Verdana;">used to make day, week and month ahead prediction. The prediction effect of</span><span> </span><span style="font-family:Verdana;">prediction model was proved in Statistica simulation environment. Analysis of estimation of the economic efficiency of prediction methods demonstrated that the use of the artificial neural networks method for short-term forecast </span><span style="font-family:Verdana;">allowed reducing the cost of electricity more efficiently. However, for mid-</span></span><span style="font-family:""> </span><span style="font-family:Verdana;">range predictions, the classification and regression tree was the most efficient method for a Jerky Enterprise. The results indicate that calculation error reduction allows decreases expenses for the purchase of electric energy.展开更多
The contribution of this paper is comparing three popular machine learning methods for software fault prediction. They are classification tree, neural network and case-based reasoning. First, three different classifie...The contribution of this paper is comparing three popular machine learning methods for software fault prediction. They are classification tree, neural network and case-based reasoning. First, three different classifiers are built based on these three different approaches. Second, the three different classifiers utilize the same product metrics as predictor variables to identify the fault-prone components. Third, the predicting results are compared on two aspects, how good prediction capabilities these models are, and how the models support understanding a process represented by the data.展开更多
Ecological protection and high-quality development of the Yellow River basin are becoming part of the national strategy in recent years.The Yellow River Estuary has been seriously affected by human activities.Especial...Ecological protection and high-quality development of the Yellow River basin are becoming part of the national strategy in recent years.The Yellow River Estuary has been seriously affected by human activities.Especially,it has been severely polluted by the nitrogen and phosphorus from land sources,which have caused serious eutrophication and harmful algal blooms.Nutrient criteria,however,was not developed for the Yellow River Estuary,which hindered nutrient management measures and eutrophication risk assessment in this key ecological function zone of China.Based on field data during 2004-2019,we adopted the frequency distribution method,correlation analysis,Linear Regression Model(LRM),Classification and Regression Tree(CART)and Nonparametric Changepoint Analysis(nCPA)methods to establish the nutrient criteria for the adjacent waters of Yellow River Estuary.The water quality criteria of dissolved inorganic nitrogen(DIN)and soluble reactive phosphorus(SRP)are recommended as 244.0μg L^(−1) and 22.4μg L^(−1),respectively.It is hoped that the results will provide scientific basis for the formulation of nutrient standards in this important estuary of China.展开更多
To assist conservationists and policymakers in managing and protecting forests in Beijing from the effects of climate change,this study predicts changes for 2012–2112 in habitable areas of three tree species—Betula ...To assist conservationists and policymakers in managing and protecting forests in Beijing from the effects of climate change,this study predicts changes for 2012–2112 in habitable areas of three tree species—Betula platyphylla,Quercus palustris,Platycladus orientalis,plus other mixed broadleaf species—in Beijing using a classification and regression tree niche model under the International Panel on Climate Change’s A2 and B2 emissions scenarios(SRES).The results show that climate change will increase annual average temperatures in the Beijing area by 2.0–4.7℃,and annual precipitation by 4.7–8.5 mm,depending on the emissions scenario used.These changes result in shifts in the range of each of the species.New suitable areas for distributions of B.platyphylla and Q.palustris will decrease in the future.The model points to significant shifts in the distributions of these species,withdrawing from their current ranges and pushing southward towards central Beijing.Most of the ranges decline during the initial 2012–2040 period before shifting southward and ending up larger overall at the end of the 88-year period.The mixed broadleaf forests expand their ranges significantly.The P.orientalis forests,on the other hand,expand their range marginally.The results indicate that climate change and its effects will accelerate significantly in Beijing over the next 88 years.Water stress is likely to be a major limiting factor on the distribution of forests and the most important factor affecting migration of species into and out of existing nature reserves.There is a potential for the extinction of some species.Therefore,long-term vegetation monitoring and warning systems will be needed to protect local species from habitat loss and genetic swamping of native species by hybrids.展开更多
BACKGROUND Down syndrome(DS)is one of the most common chromosomal aneuploidy diseases.Prenatal screening and diagnostic tests can aid the early diagnosis,appropriate management of these fetuses,and give parents an inf...BACKGROUND Down syndrome(DS)is one of the most common chromosomal aneuploidy diseases.Prenatal screening and diagnostic tests can aid the early diagnosis,appropriate management of these fetuses,and give parents an informed choice about whether or not to terminate a pregnancy.In recent years,investigations have been conducted to achieve a high detection rate(DR)and reduce the false positive rate(FPR).Hospitals have accumulated large numbers of screened cases.However,artificial intelligence methods are rarely used in the risk assessment of prenatal screening for DS.AIM To use a support vector machine algorithm,classification and regression tree algorithm,and AdaBoost algorithm in machine learning for modeling and analysis of prenatal DS screening.METHODS The dataset was from the Center for Prenatal Diagnosis at the First Hospital of Jilin University.We designed and developed intelligent algorithms based on the synthetic minority over-sampling technique(SMOTE)-Tomek and adaptive synthetic sampling over-sampling techniques to preprocess the dataset of prenatal screening information.The machine learning model was then established.Finally,the feasibility of artificial intelligence algorithms in DS screening evaluation is discussed.RESULTS The database contained 31 DS diagnosed cases,accounting for 0.03%of all patients.The dataset showed a large difference between the numbers of DS affected and non-affected cases.A combination of over-sampling and undersampling techniques can greatly increase the performance of the algorithm at processing non-balanced datasets.As the number of iterations increases,the combination of the classification and regression tree algorithm and the SMOTETomek over-sampling technique can obtain a high DR while keeping the FPR to a minimum.CONCLUSION The support vector machine algorithm and the classification and regression tree algorithm achieved good results on the DS screening dataset.When the T21 risk cutoff value was set to 270,machine learning methods had a higher DR and a lower FPR than statistical methods.展开更多
Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin,Northwest China.Using species distribution models to predict the biogeographical distribution of species is regularly used in c...Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin,Northwest China.Using species distribution models to predict the biogeographical distribution of species is regularly used in conservation and other management activities.However,the uncertainty in the data and models inevitably reduces their prediction power.The major purpose of this study is to assess the impacts of predictor variables and species distribution models on simulating T.ramosissima distribution,to explore the relationships between predictor variables and species distribution models and to model the potential distribution of T.ramosissima in this basin.Methods Three models—the generalized linear model(GLM),classification and regression tree(CART)and Random Forests—were selected and were processed on the BIOMOD platform.The presence/absence data of T.ramosissima in the Tarim Basin,which were calculated from vegetation maps,were used as response variables.Climate,soil and digital elevation model(DEM)data variables were divided into four datasets and then used as predictors.The four datasets were(i)climate variables,(ii)soil,climate and DEM variables,(iii)principal component analysis(PCA)-based climate variables and(iv)PCA-based soil,climate and DEM variables.Important Findings The results indicate that predictive variables for species distribution models should be chosen carefully,because too many predictors can reduce the prediction power.The effectiveness of using PCA to reduce the correlation among predictors and enhance the modelling power depends on the chosen predictor variables and models.Our results implied that it is better to reduce the correlating predictors before model processing.The Random Forests model was more precise than the GLM and CART models.The best model for T.ramosissima was the Random Forests model with climate predictors alone.Soil variables considered in this study could not significantly improve the model’s prediction accuracy for T.ramosissima.The potential distribution area of T.ramosissima in the Tarim Basin is;3.57310^(4) km^(2),which has the potential to mitigate global warming and produce bioenergy through restoring T.ramosissima in the Tarim Basin.展开更多
Impervious surface(IS) is often recognized as the indicator of urban environmental changes. Numerous research efforts have been devoted to studying its spatio-temporal dynamics and ecological effects, especially for t...Impervious surface(IS) is often recognized as the indicator of urban environmental changes. Numerous research efforts have been devoted to studying its spatio-temporal dynamics and ecological effects, especially for the IS in Beijing metropolitan region. However, most previous studies primarily considered the Beijing metropolitan region as a whole without considering the differences and heterogeneity among the function zones. In this study, the subpixel impervious surface results in Beijing within a time series(1991, 2001, 2005, 2011 and 2015) were extracted by means of the classification and regression tree(CART) model combined with change detection models. Then based on the method of standard deviation ellipse, Lorenz curve, contribution index(CI) and landscape metrics, the spatio-temporal dynamics and variations of IS(1991, 2001, 2011 and 2015) in different function zones and districts were analyzed. It is found that the total area of impervious surface in Beijing increased dramatically during the study period, increasing about 144.18%. The deflection angle of major axis of standard deviation ellipse decreased from 47.15° to 38.82°, indicating the major development axis in Beijing gradually moved from northeast-southwest to north-south. Moreover, the heterogeneity of impervious surface’s distribution among 16 districts weakened gradually, but the CI values and landscape metrics in four function zones differed greatly. The urban function extended zone(UFEZ), the main source of the growth of IS in Beijing, had the highest CI values. Its lowest CI value was 1.79 that is still much higher than the highest CI value in other function zones. The core function zone(CFZ), the traditional aggregation zone of impervious surface, had the highest contagion index(CONTAG) values, but it contributed less than UFEZ due to its small area. The CI value of the new urban developed zone(NUDZ) increased rapidly, and it increased from negative to positive and multiplied, becoming animportant contributor to the rise of urban impervious surface. However, the ecological conservation zone(ECZ) had a constant negative contribution all the time, and its CI value decreased gradually. Moreover, the landscape metrics and centroids of impervious surface in different density classes differed greatly. The high-density impervious surface had a more compact configuration and a greater impact on the eco-environment.展开更多
Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contri...Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contributions of the paper are two aspects. One is that we use classifier combination method to detect Mandarin and English prosodic break using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus Annotated Speech Corpus of Chinese Discourse and the English prosodic annotation corpus -- Boston University Radio News Corpus when compared with the baseline system and other researches' experimental results. The other is the feature analysis for prosodic break detection. The functions of different features, such as duration, pitch, energy, and intensity, are analyzed and compared in Mandarin and English prosodic break detection. Based on the feature analysis, we also verify some linguistic conclusions.展开更多
Background and Aims:It remains difficult to forecast the 180-day prognosis of patients with hepatitis B virus-acuteon-chronic liver failure(HBV-ACLF)using existing prognostic models.The present study aimed to derive n...Background and Aims:It remains difficult to forecast the 180-day prognosis of patients with hepatitis B virus-acuteon-chronic liver failure(HBV-ACLF)using existing prognostic models.The present study aimed to derive novel-innovative models to enhance the predictive effectiveness of the 180-day mortality in HBV-ACLF.Methods:The present cohort study examined 171 HBV-ACLF patients(non-survivors,n=62;survivors,n=109).The 27 retrospectively collected parameters included the basic demographic characteristics,clinical comorbidities,and laboratory values.Backward stepwise logistic regression(LR)and the classification and regression tree(CART)analysis were used to derive two predictive models.Meanwhile,a nomogram was created based on the LR analysis.The accuracy of the LR and CART model was detected through the area under the receiver operating characteristic curve(AUROC),compared with model of end-stage liver disease(MELD)scores.Results:Among 171 HBV-ACLF patients,the mean age was 45.17 years-old,and 11.7%of the patients were female.The LR model was constructed with six independent factors,which included age,total bilirubin,prothrombin activity,lymphocytes,monocytes and hepatic encephalopathy.The following seven variables were the prognostic factors for HBV-ACLF in the CART model:age,total bilirubin,prothrombin time,lymphocytes,neutrophils,monocytes,and blood urea nitrogen.The AUROC for the CART model(0.878)was similar to that for the LR model(0.878,p=0.898),and this exceeded that for the MELD scores(0.728,p<0.0001).Conclusions:The LR and CART model are both superior to the MELD scores in predicting the 180-day mortality of patients with HBV-ACLF.Both the LR and CART model can be used as medical decision-making tools by clinicians.展开更多
We aim to provide a tool for independent system operators to detect the collusion and identify the colluding firms by using day-ahead data. In this paper, an approach based on supervised machine learning is presented ...We aim to provide a tool for independent system operators to detect the collusion and identify the colluding firms by using day-ahead data. In this paper, an approach based on supervised machine learning is presented for collusion detection in electricity markets. The possible scenarios of the collusion among generation firms are firstly identified. Then,for each scenario and possible load demand, market equilibrium is computed. Market equilibrium points under different collusions and their peripheral points are used to train the collusion detection machine using supervised learning approaches such as classification and regression tree(CART) and support vector machine(SVM) algorithms. By applying the proposed approach to a four-firm and ten-generator test system, the accuracy of the proposed approach is evaluated and the efficiency of SVM and CART algorithms in collusion detection are compared with other supervised learning and statistical techniques.展开更多
All numerical weather prediction(NWP) models inherently have substantial biases, especially in the forecast of near-surface weather variables. Statistical methods can be used to remove the systematic error based on ...All numerical weather prediction(NWP) models inherently have substantial biases, especially in the forecast of near-surface weather variables. Statistical methods can be used to remove the systematic error based on historical bias data at observation stations. However, many end users of weather forecasts need bias corrected forecasts at locations that scarcely have any historical bias data. To circumvent this limitation, the bias of surface temperature forecasts on a regular grid covering Iran is removed, by using the information available at observation stations in the vicinity of any given grid point. To this end, the running mean error method is first used to correct the forecasts at observation stations, then four interpolation methods including inverse distance squared weighting with constant lapse rate(IDSW-CLR), Kriging with constant lapse rate(Kriging-CLR), gradient inverse distance squared with linear lapse rate(GIDS-LR), and gradient inverse distance squared with lapse rate determined by classification and regression tree(GIDS-CART), are employed to interpolate the bias corrected forecasts at neighboring observation stations to any given location. The results show that all four interpolation methods used do reduce the model error significantly,but Kriging-CLR has better performance than the other methods. For Kriging-CLR, root mean square error(RMSE)and mean absolute error(MAE) were decreased by 26% and 29%, respectively, as compared to the raw forecasts. It is found also, that after applying any of the proposed methods, unlike the raw forecasts, the bias corrected forecasts do not show spatial or temporal dependency.展开更多
Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence...Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.展开更多
Credit scoring has become a critical and challenging management science issue as the credit industry has been facing stiffer competition in recent years. Many classification methods have been suggested to tackle this ...Credit scoring has become a critical and challenging management science issue as the credit industry has been facing stiffer competition in recent years. Many classification methods have been suggested to tackle this problem in the literature. In this paper, we investigate the performance of various credit scoring models and the corresponding credit risk cost for three real-life credit scoring data sets. Besides the well-known classification algorithms (e.g. linear discriminant analysis, logistic regression, neural networks and k-nearest neighbor), we also investigate the suitability and performance of some recently proposed, advanced data mining techniques such as support vector machines (SVMs), classification and regression tree (CART), and multivariate adaptive regression splines (MARS). The performance is assessed by using the classification accuracy and cost of credit scoring errors. The experiment results show that SVM, MARS, logistic regression and neural networks yield a very good performance. However, CART and MARS's explanatory capability outperforms the other methods.展开更多
文摘The increase of competition, economic recession and financial crises has increased business failure and depending on this the researchers have attempted to develop new approaches which can yield more correct and more reliable results. The classification and regression tree (CART) is one of the new modeling techniques which is developed for this purpose. In this study, the classification and regression trees method is explained and tested the power of the financial failure prediction. CART is applied for the data of industry companies which is trade in Istanbul Stock Exchange (ISE) between 1997-2007. As a result of this study, it has been observed that, CART has a high predicting power of financial failure one, two and three years prior to failure, and profitability ratios being the most important ratios in the prediction of failure.
基金supported by the China Earthquake Administration, Institute of Seismology Foundation (IS201526246)
文摘According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree(CART) model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15%respectively. To compare the support vector machine(SVM) model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.
基金Under the auspices of National Natural Science Foundation of China(No.41671339)
文摘The sub-pixel impervious surface percentage(SPIS) is the fraction of impervious surface area in one pixel,and it is an important indicator of urbanization.Using remote sensing data,the spatial distribution of SPIS values over large areas can be extracted,and these data are significant for studies of urban climate,environment and hydrology.To develop a stabilized,multi-temporal SPIS estimation method suitable for typical temperate semi-arid climate zones with distinct seasons,an optimal model for estimating SPIS values within Beijing Municipality was built that is based on the classification and regression tree(CART) algorithm.First,models with different input variables for SPIS estimation were built by integrating multi-source remote sensing data with other auxiliary data.The optimal model was selected through the analysis and comparison of the assessed accuracy of these models.Subsequently,multi-temporal SPIS mapping was carried out based on the optimal model.The results are as follows:1) multi-seasonal images and nighttime light(NTL) data are the optimal input variables for SPIS estimation within Beijing Municipality,where the intra-annual variability in vegetation is distinct.The different spectral characteristics in the cultivated land caused by the different farming characteristics and vegetation phenology can be detected by the multi-seasonal images effectively.NLT data can effectively reduce the misestimation caused by the spectral similarity between bare land and impervious surfaces.After testing,the SPIS modeling correlation coefficient(r) is approximately 0.86,the average error(AE) is approximately 12.8%,and the relative error(RE) is approximately 0.39.2) The SPIS results have been divided into areas with high-density impervious cover(70%–100%),medium-density impervious cover(40%–70%),low-density impervious cover(10%–40%) and natural cover(0%–10%).The SPIS model performed better in estimating values for high-density urban areas than other categories.3) Multi-temporal SPIS mapping(1991–2016) was conducted based on the optimized SPIS results for 2005.After testing,AE ranges from 12.7% to 15.2%,RE ranges from 0.39 to 0.46,and r ranges from 0.81 to 0.86.It is demonstrated that the proposed approach for estimating sub-pixel level impervious surface by integrating the CART algorithm and multi-source remote sensing data is feasible and suitable for multi-temporal SPIS mapping of areas with distinct intra-annual variability in vegetation.
基金National Natural Science Foundation of China(No.61163010)
文摘This paper presents a supervised learning algorithm for retinal vascular segmentation based on classification and regression tree (CART) algorithm and improved adptive bosting (AdaBoost). Local binary patterns (LBP) texture features and local features are extracted by extracting,reversing,dilating and enhancing the green components of retinal images to construct a 17-dimensional feature vector. A dataset is constructed by using the feature vector and the data manually marked by the experts. The feature is used to generate CART binary tree for nodes,where CART binary tree is as the AdaBoost weak classifier,and AdaBoost is improved by adding some re-judgment functions to form a strong classifier. The proposed algorithm is simulated on the digital retinal images for vessel extraction (DRIVE). The experimental results show that the proposed algorithm has higher segmentation accuracy for blood vessels,and the result basically contains complete blood vessel details. Moreover,the segmented blood vessel tree has good connectivity,which basically reflects the distribution trend of blood vessels. Compared with the traditional AdaBoost classification algorithm and the support vector machine (SVM) based classification algorithm,the proposed algorithm has higher average accuracy and reliability index,which is similar to the segmentation results of the state-of-the-art segmentation algorithm.
文摘<strong>Purpose of Review:</strong> The management of eye injuries is both difficult and argumentative. This study attempts to highlight the management of ocular trauma using currently available information in the literature and author experience. This review presents a workable framework from the first presentation, epidemiology, classification, investigations, management principles, complications, prognostic factors, final visual outcome and management debates. <strong>Review Findings:</strong> Mechanical ocular trauma is a leading cause of monocular blindness and possible handicap worldwide. Among several classification systems, the most widely accepted is Birmingham Eye Trauma Terminology (BETT). Mechanical ocular trauma is a topic of unsolved controversy. Patching for corneal abrasion, paracentesis for hyphema, the timing of cataract surgery and intraocular lens implantation are all issues in anterior segment injuries. Regarding posterior segment controversies, the timing of vitrectomy, use of prophylactic cryotherapy, the necessity of intravitreal antibiotics in the absence of infection, the use of vitrectomy vs vitreous tap in traumatic endophthalmitis is the issues. The pediatric age group needs to be approached by a different protocol due to the risk of amblyopia, intraocular inflammation, and significant vitreoretinal adhesions. The various prognostic factors have a role in the final visual outcome. B scan is used to exclude R.D, Intraocular foreign body (IOFB), and vitreous haemorrhage in hazy media. Individual surgical strategies are used for every patient according to the classification and extent of the injuries. <strong>Conclusion:</strong> This article examines relevant evidence on the management challenges and controversies of mechanical trauma of the eye and offers treatment recommendations based on published research and the authors’ own experience.
文摘BACKGROUND Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning.The global community has recently witnessed an increase in the mortality rate due to liver disease.This could be attributed to many factors,among which are human habits,awareness issues,poor healthcare,and late detection.To curb the growing threats from liver disease,early detection is critical to help reduce the risks and improve treatment outcome.Emerging technologies such as machine learning,as shown in this study,could be deployed to assist in enhancing its prediction and treatment.AIM To present a more efficient system for timely prediction of liver disease using a hybrid eXtreme Gradient Boosting model with hyperparameter tuning with a view to assist in early detection,diagnosis,and reduction of risks and mortality associated with the disease.METHODS The dataset used in this study consisted of 416 people with liver problems and 167 with no such history.The data were collected from the state of Andhra Pradesh,India,through https://www.kaggle.com/datasets/uciml/indian-liver-patientrecords.The population was divided into two sets depending on the disease state of the patient.This binary information was recorded in the attribute"is_patient".RESULTS The results indicated that the chi-square automated interaction detection and classification and regression trees models achieved an accuracy level of 71.36%and 73.24%,respectively,which was much better than the conventional method.The proposed solution would assist patients and physicians in tackling the problem of liver disease and ensuring that cases are detected early to prevent it from developing into cirrhosis(scarring)and to enhance the survival of patients.The study showed the potential of machine learning in health care,especially as it concerns disease prediction and monitoring.CONCLUSION This study contributed to the knowledge of machine learning application to health and to the efforts toward combating the problem of liver disease.However,relevant authorities have to invest more into machine learning research and other health technologies to maximize their potential.
文摘Decision trees and their ensembles became quite popular for data analysis during the past decade.One of the main reasons for that is current boom in big data,where traditional statistical methods(such as,e.g.,multiple linear regression)are not very efficient.However,in chemometrics these methods are still not very widespread,first of all because of several limitations related to the ratio between number of variables and observations.This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification.We will try to consider all important aspects including optimization and validation of models,evaluation of results,treating missing data and selection of most important variables.The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares.
文摘Wholesale and retail markets for electricity and power require consumers to forecast electricity consumption at different time intervals. The study aims to</span><span style="font-family:Verdana;"> increase economic efficiency of the enterprise through the introduction of algorithm for forecasting electric energy consumption unchanged in technological process. Qualitative forecast allows you to essentially reduce costs of electrical </span><span style="font-family:Verdana;">energy, because power cannot be stockpiled. Therefore, when buying excess electrical power, costs can increase either by selling it on the balancing energy </span><span style="font-family:Verdana;">market or by maintaining reserve capacity. If the purchased power is insufficient, the costs increase is due to the purchase of additional capacity. This paper illustrates three methods of forecasting electric energy consumption: autoregressive integrated moving average method, artificial neural networks and classification and regression trees. Actual data from consuming of electrical energy was </span><span style="font-family:Verdana;">used to make day, week and month ahead prediction. The prediction effect of</span><span> </span><span style="font-family:Verdana;">prediction model was proved in Statistica simulation environment. Analysis of estimation of the economic efficiency of prediction methods demonstrated that the use of the artificial neural networks method for short-term forecast </span><span style="font-family:Verdana;">allowed reducing the cost of electricity more efficiently. However, for mid-</span></span><span style="font-family:""> </span><span style="font-family:Verdana;">range predictions, the classification and regression tree was the most efficient method for a Jerky Enterprise. The results indicate that calculation error reduction allows decreases expenses for the purchase of electric energy.
文摘The contribution of this paper is comparing three popular machine learning methods for software fault prediction. They are classification tree, neural network and case-based reasoning. First, three different classifiers are built based on these three different approaches. Second, the three different classifiers utilize the same product metrics as predictor variables to identify the fault-prone components. Third, the predicting results are compared on two aspects, how good prediction capabilities these models are, and how the models support understanding a process represented by the data.
基金supported by the National Key Research and Development Program of China(No.2018YFC1407601).
文摘Ecological protection and high-quality development of the Yellow River basin are becoming part of the national strategy in recent years.The Yellow River Estuary has been seriously affected by human activities.Especially,it has been severely polluted by the nitrogen and phosphorus from land sources,which have caused serious eutrophication and harmful algal blooms.Nutrient criteria,however,was not developed for the Yellow River Estuary,which hindered nutrient management measures and eutrophication risk assessment in this key ecological function zone of China.Based on field data during 2004-2019,we adopted the frequency distribution method,correlation analysis,Linear Regression Model(LRM),Classification and Regression Tree(CART)and Nonparametric Changepoint Analysis(nCPA)methods to establish the nutrient criteria for the adjacent waters of Yellow River Estuary.The water quality criteria of dissolved inorganic nitrogen(DIN)and soluble reactive phosphorus(SRP)are recommended as 244.0μg L^(−1) and 22.4μg L^(−1),respectively.It is hoped that the results will provide scientific basis for the formulation of nutrient standards in this important estuary of China.
基金This research was supported by the Fundamental Research Funds for the Central University(2018RD001).
文摘To assist conservationists and policymakers in managing and protecting forests in Beijing from the effects of climate change,this study predicts changes for 2012–2112 in habitable areas of three tree species—Betula platyphylla,Quercus palustris,Platycladus orientalis,plus other mixed broadleaf species—in Beijing using a classification and regression tree niche model under the International Panel on Climate Change’s A2 and B2 emissions scenarios(SRES).The results show that climate change will increase annual average temperatures in the Beijing area by 2.0–4.7℃,and annual precipitation by 4.7–8.5 mm,depending on the emissions scenario used.These changes result in shifts in the range of each of the species.New suitable areas for distributions of B.platyphylla and Q.palustris will decrease in the future.The model points to significant shifts in the distributions of these species,withdrawing from their current ranges and pushing southward towards central Beijing.Most of the ranges decline during the initial 2012–2040 period before shifting southward and ending up larger overall at the end of the 88-year period.The mixed broadleaf forests expand their ranges significantly.The P.orientalis forests,on the other hand,expand their range marginally.The results indicate that climate change and its effects will accelerate significantly in Beijing over the next 88 years.Water stress is likely to be a major limiting factor on the distribution of forests and the most important factor affecting migration of species into and out of existing nature reserves.There is a potential for the extinction of some species.Therefore,long-term vegetation monitoring and warning systems will be needed to protect local species from habitat loss and genetic swamping of native species by hybrids.
基金Supported by Science and Technology Department of Jilin Province,No.20190302073GX.
文摘BACKGROUND Down syndrome(DS)is one of the most common chromosomal aneuploidy diseases.Prenatal screening and diagnostic tests can aid the early diagnosis,appropriate management of these fetuses,and give parents an informed choice about whether or not to terminate a pregnancy.In recent years,investigations have been conducted to achieve a high detection rate(DR)and reduce the false positive rate(FPR).Hospitals have accumulated large numbers of screened cases.However,artificial intelligence methods are rarely used in the risk assessment of prenatal screening for DS.AIM To use a support vector machine algorithm,classification and regression tree algorithm,and AdaBoost algorithm in machine learning for modeling and analysis of prenatal DS screening.METHODS The dataset was from the Center for Prenatal Diagnosis at the First Hospital of Jilin University.We designed and developed intelligent algorithms based on the synthetic minority over-sampling technique(SMOTE)-Tomek and adaptive synthetic sampling over-sampling techniques to preprocess the dataset of prenatal screening information.The machine learning model was then established.Finally,the feasibility of artificial intelligence algorithms in DS screening evaluation is discussed.RESULTS The database contained 31 DS diagnosed cases,accounting for 0.03%of all patients.The dataset showed a large difference between the numbers of DS affected and non-affected cases.A combination of over-sampling and undersampling techniques can greatly increase the performance of the algorithm at processing non-balanced datasets.As the number of iterations increases,the combination of the classification and regression tree algorithm and the SMOTETomek over-sampling technique can obtain a high DR while keeping the FPR to a minimum.CONCLUSION The support vector machine algorithm and the classification and regression tree algorithm achieved good results on the DS screening dataset.When the T21 risk cutoff value was set to 270,machine learning methods had a higher DR and a lower FPR than statistical methods.
基金National Basic Research Program of China(973 Program)(No.2010CB951303 and No.2009CB421106).
文摘Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin,Northwest China.Using species distribution models to predict the biogeographical distribution of species is regularly used in conservation and other management activities.However,the uncertainty in the data and models inevitably reduces their prediction power.The major purpose of this study is to assess the impacts of predictor variables and species distribution models on simulating T.ramosissima distribution,to explore the relationships between predictor variables and species distribution models and to model the potential distribution of T.ramosissima in this basin.Methods Three models—the generalized linear model(GLM),classification and regression tree(CART)and Random Forests—were selected and were processed on the BIOMOD platform.The presence/absence data of T.ramosissima in the Tarim Basin,which were calculated from vegetation maps,were used as response variables.Climate,soil and digital elevation model(DEM)data variables were divided into four datasets and then used as predictors.The four datasets were(i)climate variables,(ii)soil,climate and DEM variables,(iii)principal component analysis(PCA)-based climate variables and(iv)PCA-based soil,climate and DEM variables.Important Findings The results indicate that predictive variables for species distribution models should be chosen carefully,because too many predictors can reduce the prediction power.The effectiveness of using PCA to reduce the correlation among predictors and enhance the modelling power depends on the chosen predictor variables and models.Our results implied that it is better to reduce the correlating predictors before model processing.The Random Forests model was more precise than the GLM and CART models.The best model for T.ramosissima was the Random Forests model with climate predictors alone.Soil variables considered in this study could not significantly improve the model’s prediction accuracy for T.ramosissima.The potential distribution area of T.ramosissima in the Tarim Basin is;3.57310^(4) km^(2),which has the potential to mitigate global warming and produce bioenergy through restoring T.ramosissima in the Tarim Basin.
基金National Basic Research Program of China,No.2015CB953603National Natural Science Foundation of China,No.41671339State Key Laboratory of Earth Surface Processes and Resource Ecology,No.2017-FX-01(1)
文摘Impervious surface(IS) is often recognized as the indicator of urban environmental changes. Numerous research efforts have been devoted to studying its spatio-temporal dynamics and ecological effects, especially for the IS in Beijing metropolitan region. However, most previous studies primarily considered the Beijing metropolitan region as a whole without considering the differences and heterogeneity among the function zones. In this study, the subpixel impervious surface results in Beijing within a time series(1991, 2001, 2005, 2011 and 2015) were extracted by means of the classification and regression tree(CART) model combined with change detection models. Then based on the method of standard deviation ellipse, Lorenz curve, contribution index(CI) and landscape metrics, the spatio-temporal dynamics and variations of IS(1991, 2001, 2011 and 2015) in different function zones and districts were analyzed. It is found that the total area of impervious surface in Beijing increased dramatically during the study period, increasing about 144.18%. The deflection angle of major axis of standard deviation ellipse decreased from 47.15° to 38.82°, indicating the major development axis in Beijing gradually moved from northeast-southwest to north-south. Moreover, the heterogeneity of impervious surface’s distribution among 16 districts weakened gradually, but the CI values and landscape metrics in four function zones differed greatly. The urban function extended zone(UFEZ), the main source of the growth of IS in Beijing, had the highest CI values. Its lowest CI value was 1.79 that is still much higher than the highest CI value in other function zones. The core function zone(CFZ), the traditional aggregation zone of impervious surface, had the highest contagion index(CONTAG) values, but it contributed less than UFEZ due to its small area. The CI value of the new urban developed zone(NUDZ) increased rapidly, and it increased from negative to positive and multiplied, becoming animportant contributor to the rise of urban impervious surface. However, the ecological conservation zone(ECZ) had a constant negative contribution all the time, and its CI value decreased gradually. Moreover, the landscape metrics and centroids of impervious surface in different density classes differed greatly. The high-density impervious surface had a more compact configuration and a greater impact on the eco-environment.
基金Supported by the National Natural Science Foundation of China under Grant Nos. 90820303,90820011the Natural Science Foundation of Shandong Province of China under Grant No. ZR2011FQ024
文摘Automatic prosodic break detection and annotation are important for both speech understanding and natural speech synthesis. In this paper, we discuss automatic prosodic break detection and feature analysis. The contributions of the paper are two aspects. One is that we use classifier combination method to detect Mandarin and English prosodic break using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus Annotated Speech Corpus of Chinese Discourse and the English prosodic annotation corpus -- Boston University Radio News Corpus when compared with the baseline system and other researches' experimental results. The other is the feature analysis for prosodic break detection. The functions of different features, such as duration, pitch, energy, and intensity, are analyzed and compared in Mandarin and English prosodic break detection. Based on the feature analysis, we also verify some linguistic conclusions.
基金The study was supported by the National Natural Science Foundation of China(No.81470888)and(No.82002461)the Medjaden Academy and Research Foundation for Young Scientists(No.MJR20211110)the Fund for Fostering Young Scholars of Peking University Health Science Center(No.BMU2021PY010).
文摘Background and Aims:It remains difficult to forecast the 180-day prognosis of patients with hepatitis B virus-acuteon-chronic liver failure(HBV-ACLF)using existing prognostic models.The present study aimed to derive novel-innovative models to enhance the predictive effectiveness of the 180-day mortality in HBV-ACLF.Methods:The present cohort study examined 171 HBV-ACLF patients(non-survivors,n=62;survivors,n=109).The 27 retrospectively collected parameters included the basic demographic characteristics,clinical comorbidities,and laboratory values.Backward stepwise logistic regression(LR)and the classification and regression tree(CART)analysis were used to derive two predictive models.Meanwhile,a nomogram was created based on the LR analysis.The accuracy of the LR and CART model was detected through the area under the receiver operating characteristic curve(AUROC),compared with model of end-stage liver disease(MELD)scores.Results:Among 171 HBV-ACLF patients,the mean age was 45.17 years-old,and 11.7%of the patients were female.The LR model was constructed with six independent factors,which included age,total bilirubin,prothrombin activity,lymphocytes,monocytes and hepatic encephalopathy.The following seven variables were the prognostic factors for HBV-ACLF in the CART model:age,total bilirubin,prothrombin time,lymphocytes,neutrophils,monocytes,and blood urea nitrogen.The AUROC for the CART model(0.878)was similar to that for the LR model(0.878,p=0.898),and this exceeded that for the MELD scores(0.728,p<0.0001).Conclusions:The LR and CART model are both superior to the MELD scores in predicting the 180-day mortality of patients with HBV-ACLF.Both the LR and CART model can be used as medical decision-making tools by clinicians.
文摘We aim to provide a tool for independent system operators to detect the collusion and identify the colluding firms by using day-ahead data. In this paper, an approach based on supervised machine learning is presented for collusion detection in electricity markets. The possible scenarios of the collusion among generation firms are firstly identified. Then,for each scenario and possible load demand, market equilibrium is computed. Market equilibrium points under different collusions and their peripheral points are used to train the collusion detection machine using supervised learning approaches such as classification and regression tree(CART) and support vector machine(SVM) algorithms. By applying the proposed approach to a four-firm and ten-generator test system, the accuracy of the proposed approach is evaluated and the efficiency of SVM and CART algorithms in collusion detection are compared with other supervised learning and statistical techniques.
文摘All numerical weather prediction(NWP) models inherently have substantial biases, especially in the forecast of near-surface weather variables. Statistical methods can be used to remove the systematic error based on historical bias data at observation stations. However, many end users of weather forecasts need bias corrected forecasts at locations that scarcely have any historical bias data. To circumvent this limitation, the bias of surface temperature forecasts on a regular grid covering Iran is removed, by using the information available at observation stations in the vicinity of any given grid point. To this end, the running mean error method is first used to correct the forecasts at observation stations, then four interpolation methods including inverse distance squared weighting with constant lapse rate(IDSW-CLR), Kriging with constant lapse rate(Kriging-CLR), gradient inverse distance squared with linear lapse rate(GIDS-LR), and gradient inverse distance squared with lapse rate determined by classification and regression tree(GIDS-CART), are employed to interpolate the bias corrected forecasts at neighboring observation stations to any given location. The results show that all four interpolation methods used do reduce the model error significantly,but Kriging-CLR has better performance than the other methods. For Kriging-CLR, root mean square error(RMSE)and mean absolute error(MAE) were decreased by 26% and 29%, respectively, as compared to the raw forecasts. It is found also, that after applying any of the proposed methods, unlike the raw forecasts, the bias corrected forecasts do not show spatial or temporal dependency.
基金supported by the National Natural Science Foundation of China (Nos. 41501229, 41371224, 41130530, and 91325301)the China Postdoctoral Science Foundation (No. 2015M581876)
文摘Soil diagnostic horizons, which each have a set of quantified properties, play a key role in soil classification. However, they are difficult to predict, and few attempts have been made to map their spatial occurrence. We evaluated and compared four machine learning algorithms, namely, the classification and regression tree(CART), random forest(RF), boosted regression trees(BRT), and support vector machine(SVM), to map the occurrence of the soil mattic horizon in the northeastern Qinghai-Tibetan Plateau using readily available ancillary data. The mechanisms of resampling and ensemble techniques significantly improved prediction accuracies(measured based on area under the receiver operator characteristic curve score(AUC)) and produced more stable results for the BRT(AUC of 0.921 ± 0.012, mean ± standard deviation) and RF(0.908 ± 0.013) algorithms compared to the CART algorithm(0.784 ± 0.012), which is the most commonly used machine learning method. Although the SVM algorithm yielded a comparable AUC value(0.906 ± 0.006) to the RF and BRT algorithms, it is sensitive to parameter settings, which are extremely time-consuming.Therefore, we consider it inadequate for occurrence-distribution modeling. Considering the obvious advantages of high prediction accuracy, robustness to parameter settings, the ability to estimate uncertainty in prediction, and easy interpretation of predictor variables, BRT seems to be the most desirable method. These results provide an insight into the use of machine learning algorithms to map the mattic horizon and potentially other soil diagnostic horizons.
基金This work was supported in part by National Science Foundation of China under Grant No. 70171015
文摘Credit scoring has become a critical and challenging management science issue as the credit industry has been facing stiffer competition in recent years. Many classification methods have been suggested to tackle this problem in the literature. In this paper, we investigate the performance of various credit scoring models and the corresponding credit risk cost for three real-life credit scoring data sets. Besides the well-known classification algorithms (e.g. linear discriminant analysis, logistic regression, neural networks and k-nearest neighbor), we also investigate the suitability and performance of some recently proposed, advanced data mining techniques such as support vector machines (SVMs), classification and regression tree (CART), and multivariate adaptive regression splines (MARS). The performance is assessed by using the classification accuracy and cost of credit scoring errors. The experiment results show that SVM, MARS, logistic regression and neural networks yield a very good performance. However, CART and MARS's explanatory capability outperforms the other methods.