This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting de...This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.展开更多
Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical appl...Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical applications.Conventional methods of predicting pile drivability often rely on simplified physicalmodels or empirical formulas,whichmay lack accuracy or applicability in complex geological conditions.Therefore,this study presents a practical machine learning approach,namely a Random Forest(RF)optimized by Bayesian Optimization(BO)and Particle Swarm Optimization(PSO),which not only enhances prediction accuracy but also better adapts to varying geological environments to predict the drivability parameters of piles(i.e.,maximumcompressive stress,maximum tensile stress,and blow per foot).In addition,support vector regression,extreme gradient boosting,k nearest neighbor,and decision tree are also used and applied for comparison purposes.In order to train and test these models,among the 4072 datasets collected with 17model inputs,3258 datasets were randomly selected for training,and the remaining 814 datasets were used for model testing.Lastly,the results of these models were compared and evaluated using two performance indices,i.e.,the root mean square error(RMSE)and the coefficient of determination(R2).The results indicate that the optimized RF model achieved lower RMSE than other prediction models in predicting the three parameters,specifically 0.044,0.438,and 0.146;and higher R^(2) values than other implemented techniques,specifically 0.966,0.884,and 0.977.In addition,the sensitivity and uncertainty of the optimized RF model were analyzed using Sobol sensitivity analysis and Monte Carlo(MC)simulation.It can be concluded that the optimized RF model could be used to predict the performance of the pile,and it may provide a useful reference for solving some problems under similar engineering conditions.展开更多
Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat i...Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.展开更多
Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index...Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.展开更多
Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Senti...Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Sentinel-2), spectral indices, and ancillary data to model the spatial distribution of heavy metals in the soils along the Nairobi River. The model was generated using the Random Forest package in R. Using R2 to assess the prediction accuracy, the Random Forest model generated satisfactory results for all the elements. It also ranked the variables in order of their importance in the overall prediction. Spectral indices were the most important variables within the rankings. From the predicted topsoil maps, there were high concentrations of Cadmium on the easterly end of the river. Cadmium is an impurity in detergents, and this section is in close proximity to the Nairobi water sewerage plant, which could be a direct source of Cadmium. Some farms had Zinc levels which were above the World Health Organization recommended limit. The Random Forest model performed satisfactorily. However, the predictions can be improved further if the spatial resolutions of the various variables are increased and through the addition of more predictor variables.展开更多
Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identificatio...Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.展开更多
This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock pric...This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.展开更多
The car-following models are the research basis of traffic flow theory and microscopic traffic simulation. Among the previous work, the theory-driven models are dominant, while the data-driven ones are relatively rare...The car-following models are the research basis of traffic flow theory and microscopic traffic simulation. Among the previous work, the theory-driven models are dominant, while the data-driven ones are relatively rare. In recent years, the related technologies of Intelligent Transportation System (ITS) re</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">presented by the Vehicles to Everything (V2X) technology have been developing rapidly. Utilizing the related technologies of ITS, the large-scale vehicle microscopic trajectory data with high quality can be acquired, which provides the research foundation for modeling the car-following behavior based on the data-driven methods. According to this point, a data-driven car-following model based on the Random Forest (RF) method was constructed in this work, and the Next Generation Simulation (NGSIM) dataset was used to calibrate and train the constructed model. The Artificial Neural Network (ANN) model, GM model, and Full Velocity Difference (FVD) model are em</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">ployed to comparatively verify the proposed model. The research results suggest that the model proposed in this work can accurately describe the car-</span><span style="font-family:Verdana;"> </span><span style="font-family:Verdana;">following behavior with better performance under multiple performance indicators.展开更多
In order to avoid the noise and over fitting and further improve the limited classification performance of the real decision tree, a traffic incident detection method based on the random forest algorithm is presented....In order to avoid the noise and over fitting and further improve the limited classification performance of the real decision tree, a traffic incident detection method based on the random forest algorithm is presented. From the perspective of classification strength and correlation, three experiments are performed to investigate the potential application of random forest to traffic incident detection: comparison with a different number of decision trees; comparison with different decision trees; comparison with the neural network. The real traffic data of the 1-880 database is used in the experiments. The detection performance is evaluated by the common criteria including the detection rate, the false alarm rate, the mean time to detection, the classification rate and the area under the curve of the receiver operating characteristic (ROC). The experimental results indicate that the model based on random forest can improve the decision rate, reduce the testing time, and obtain a higher classification rate. Meanwhile, it is competitive compared with multi-layer feed forward neural networks (MLF).展开更多
Height–diameter relationships are essential elements of forest assessment and modeling efforts.In this work,two linear and eighteen nonlinear height–diameter equations were evaluated to find a local model for Orient...Height–diameter relationships are essential elements of forest assessment and modeling efforts.In this work,two linear and eighteen nonlinear height–diameter equations were evaluated to find a local model for Oriental beech(Fagus orientalis Lipsky) in the Hyrcanian Forest in Iran.The predictive performance of these models was first assessed by different evaluation criteria: adjusted R^2(R^2_(adj)),root mean square error(RMSE),relative RMSE(%RMSE),bias,and relative bias(%bias) criteria.The best model was selected for use as the base mixed-effects model.Random parameters for test plots were estimated with different tree selection options.Results show that the Chapman–Richards model had better predictive ability in terms of adj R^2(0.81),RMSE(3.7 m),%RMSE(12.9),bias(0.8),%Bias(2.79) than the other models.Furthermore,the calibration response,based on a selection of four trees from the sample plots,resulted in a reduction percentage for bias and RMSE of about 1.6–2.7%.Our results indicate that the calibrated model produced the most accurate results.展开更多
As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly...As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly and accurately is a significant, popular and meaningful task.Classification methods based on laser-induced breakdown spectroscopy(LIBS) have been reported in recent years. Although LIBS is an advanced detection technology, it is necessary to combine it with some algorithm to reach the goal of rapid and accurate classification. As an important machine learning method, the random forest(RF) algorithm plays a great role in pattern recognition and material classification. This paper introduces a rapid classification method of Al alloy based on LIBS and the RF algorithm. The results show that the best accuracy that can be reached using this method to classify Al alloy samples is 98.59%, the average of which is 98.45%. It also reveals through the relationship laws that the accuracy varies with the number of trees in the RF and the size of the training sample set in the RF. According to the laws, researchers can find out the optimized parameters in the RF algorithm in order to achieve,as expected, a good result. These results prove that LIBS with the RF algorithm can exactly classify Al alloy effectively, precisely and rapidly with high accuracy, which obviously has significant practical value.展开更多
针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取...针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。展开更多
Traffic flow prediction,as the basis of signal coordination and travel time prediction,has become a research point in the field of transportation.For traffic flow prediction,researchers have proposed a variety of meth...Traffic flow prediction,as the basis of signal coordination and travel time prediction,has become a research point in the field of transportation.For traffic flow prediction,researchers have proposed a variety of methods,but most of these methods only use the time domain information of traffic flow data to predict the traffic flow,ignoring the impact of spatial correlation on the prediction of target road segment flow,which leads to poor prediction accuracy.In this paper,a traffic flow prediction model called as long short time memory and random forest(LSTM-RF)was proposed based on the combination model.In the process of traffic flow prediction,the long short time memory(LSTM)model was used to extract the time sequence features of the predicted target road segment.Then,the predicted value of LSTM and the collected information of adjacent upstream and downstream sections were simultaneously used as the input features of the random forest model to analyze the spatial-temporal correlation of traffic flow,so as to obtain the final prediction results.The traffic flow data of 132 urban road sections collected by the license plate recognition system in Guiyang City were tested and verified.The results show that the method is better than the single model in prediction accuracy,and the prediction error is obviously reduced compared with the single model.展开更多
Water quality analysis is essential to understand the ecological status of aquatic life.Conventional water quality index(WQI)assessment methods are limited to features such as water acidic or basicity(pH),dissolved ox...Water quality analysis is essential to understand the ecological status of aquatic life.Conventional water quality index(WQI)assessment methods are limited to features such as water acidic or basicity(pH),dissolved oxygen(DO),biological oxygen demand(BOD),chemical oxygen demand(COD),ammoniacal nitrogen(NH3-N),and suspended solids(SS).These features are often insufficient to represent the water quality of a heavy metal–polluted river.Therefore,this paper aims to explore and analyze novel input features in order to formulate an improved WQI.In this work,prospective insights on the feasibility of alternative water quality input variables as new discriminant features are discussed.The new discriminant features are a step toward formulating adaptive water quality parameters according to the land use activities surrounding the river.The results and analysis obtained from this study have proven the possibility of predicting WQI using new input features.This work analyzes 17 new input features,namely conductivity(COND),salinity(SAL),turbidity(TUR),dissolved solids(DS),nitrate(NO3),chloride(Cl),phosphate(PO4),arsenic(As),chromium(Cr),zinc(Zn),calcium(Ca),iron(Fe),potassium(K),magnesium(Mg),sodium(Na),E.coli,and total coliform,in predicting WQI using machine learning techniques.Five regression algorithms-random forest(RF),AdaBoost,support vector regression(SVR),decision tree regression(DTR),and multilayer perception(MLP)-are applied for preliminary model selection.The results show that the RF algorithm exhibits better prediction performance,with R2 of 0.974.Then,this work proposes a modified RF by incorporating the synthetic minority oversampling technique(SMOTE)into the conventional RF method.The proposed modified RF method is shown to achieve 77.68%,74%,69%,and 71%accuracy,precision,recall,and F1-score,respectively.In addition,the sensitivity analysis is included to highlight the importance of the turbidity variable in WQI prediction.The results of sensitivity analysis highlight the importance of certain water quality variables that are not present in the conventional WQI formulation.展开更多
基金This work was supported in part by the National Natural Science Foundation of China(61601418,41602362,61871259)in part by the Opening Foundation of Hunan Engineering and Research Center of Natural Resource Investigation and Monitoring(2020-5)+1 种基金in part by the Qilian Mountain National Park Research Center(Qinghai)(grant number:GKQ2019-01)in part by the Geomatics Technology and Application Key Laboratory of Qinghai Province,Grant No.QHDX-2019-01.
文摘This work was to generate landslide susceptibility maps for the Three Gorges Reservoir(TGR) area, China by using different machine learning models. Three advanced machine learning methods, namely, gradient boosting decision tree(GBDT), random forest(RF) and information value(InV) models, were used, and the performances were assessed and compared. In total, 202 landslides were mapped by using a series of field surveys, aerial photographs, and reviews of historical and bibliographical data. Nine causative factors were then considered in landslide susceptibility map generation by using the GBDT, RF and InV models. All of the maps of the causative factors were resampled to a resolution of 28.5 m. Of the 486289 pixels in the area,28526 pixels were landslide pixels, and 457763 pixels were non-landslide pixels. Finally, landslide susceptibility maps were generated by using the three machine learning models, and their performances were assessed through receiver operating characteristic(ROC) curves, the sensitivity, specificity,overall accuracy(OA), and kappa coefficient(KAPPA). The results showed that the GBDT, RF and In V models in overall produced reasonable accurate landslide susceptibility maps. Among these three methods, the GBDT method outperforms the other two machine learning methods, which can provide strong technical support for producing landslide susceptibility maps in TGR.
基金supported by the National Science Foundation of China(42107183).
文摘Driven piles are used in many geological environments as a practical and convenient structural component.Hence,the determination of the drivability of piles is actually of great importance in complex geotechnical applications.Conventional methods of predicting pile drivability often rely on simplified physicalmodels or empirical formulas,whichmay lack accuracy or applicability in complex geological conditions.Therefore,this study presents a practical machine learning approach,namely a Random Forest(RF)optimized by Bayesian Optimization(BO)and Particle Swarm Optimization(PSO),which not only enhances prediction accuracy but also better adapts to varying geological environments to predict the drivability parameters of piles(i.e.,maximumcompressive stress,maximum tensile stress,and blow per foot).In addition,support vector regression,extreme gradient boosting,k nearest neighbor,and decision tree are also used and applied for comparison purposes.In order to train and test these models,among the 4072 datasets collected with 17model inputs,3258 datasets were randomly selected for training,and the remaining 814 datasets were used for model testing.Lastly,the results of these models were compared and evaluated using two performance indices,i.e.,the root mean square error(RMSE)and the coefficient of determination(R2).The results indicate that the optimized RF model achieved lower RMSE than other prediction models in predicting the three parameters,specifically 0.044,0.438,and 0.146;and higher R^(2) values than other implemented techniques,specifically 0.966,0.884,and 0.977.In addition,the sensitivity and uncertainty of the optimized RF model were analyzed using Sobol sensitivity analysis and Monte Carlo(MC)simulation.It can be concluded that the optimized RF model could be used to predict the performance of the pile,and it may provide a useful reference for solving some problems under similar engineering conditions.
基金Under the auspices of National Natural Science Foundation of China(No.41977411,41771383)Technology Research Project of the Education Department of Jilin Province(No.JJKH20210445KJ)。
文摘Given the rapid urbanization worldwide, Urban Heat Island(UHI) effect has been a severe issue limiting urban sustainability in both large and small cities. In order to study the spatial pattern of Surface urban heat island(SUHI) in China’s Meihekou City, a combination method of Monte Carlo and Random Forest Regression(MC-RFR) is developed to construct the relationship between landscape pattern indices and Land Surface Temperature(LST). In this method, Monte Carlo acceptance-rejection sampling was added to the bootstrap layer of RFR to ensure the sensitivity of RFR to outliners of SUHI effect. The SHUI in 2030 was predicted by using this MC-RFR and the modeled future landscape pattern by Cellular Automata and Markov combination model(CA-Markov). Results reveal that forestland can greatly alleviate the impact of SUHI effect, while reasonable construction of urban land can also slow down the rising trend of SUHI. MC-RFR performs better for characterizing the relationship between landscape pattern and LST than single RFR or Linear Regression model. By 2030, the overall SUHI effect of Meihekou will be greatly enhanced, and the center of urban development will gradually shift to the central and western regions of the city. We suggest that urban designer and managers should concentrate vegetation and disperse built-up land to weaken the SUHI in the construction of new urban areas for its sustainability.
文摘Potential of the Random Forest Model on mapping of different desertification processes was studied in Muttuma watershed of mid-Murrumbidgee river region of New South Wales,Australia.Desertification vulnerability index was developed using climate,terrain,vegetation,soil and land quality indices to identify environmentally sensitive areas for desertification.Random Forest Model(RFM)was used to predict the different desertification processes such as soil erosion,salinization and waterlogging in the watershed and the information needed to train classification algorithms was obtained from satellite imagery interpretation and ground truth data.Climatic factors(evaporation,rainfall,temperature),terrain factors(aspect,slope,slope length,steepness,and wetness index),soil properties(pH,organic carbon,clay and sand content)and vulnerability indices were used as an explanatory variable.Classification accuracy and kappa index were calculated for training and testing datasets.We recorded an overall accuracy rate of 87.7%and 72.1%for training and testing sites,respectively.We found larger discrepancies between overall accuracy rate and kappa index for testing datasets(72.2%and 27.5%,respectively)suggesting that all the classes are not predicted well.The prediction of soil erosion and no desertification process was good and poor for salinization and water-logging process.Overall,the results observed give a new idea of using the knowledge of desertification process in training areas that can be used to predict the desertification processes at unvisited areas.
文摘Modeling the spatial distribution of soil heavy metals is important in determining the safety of contaminated soils for agricultural use. This study utilized 60 topsoil samples (0 - 30 cm), multispectral images (Sentinel-2), spectral indices, and ancillary data to model the spatial distribution of heavy metals in the soils along the Nairobi River. The model was generated using the Random Forest package in R. Using R2 to assess the prediction accuracy, the Random Forest model generated satisfactory results for all the elements. It also ranked the variables in order of their importance in the overall prediction. Spectral indices were the most important variables within the rankings. From the predicted topsoil maps, there were high concentrations of Cadmium on the easterly end of the river. Cadmium is an impurity in detergents, and this section is in close proximity to the Nairobi water sewerage plant, which could be a direct source of Cadmium. Some farms had Zinc levels which were above the World Health Organization recommended limit. The Random Forest model performed satisfactorily. However, the predictions can be improved further if the spatial resolutions of the various variables are increased and through the addition of more predictor variables.
基金supported by the grants from the Natural Science Foundation of Hubei Province(No.2020CFB780)the Fundamental Research Funds for the Central Universities(No.2017KFYXJJ020).
文摘Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.
文摘This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allow three modes of trades, namely, buy, sell or stand by, and the stand-by case is important as it caters to the market conditions where a model does not produce a strong signal of buy or sell. Linear trading models are firstly developed with the scoring technique which weights higher on successful indicators, as well as with the Least Squares technique which tries to match the past perfect trades with its weights. The linear models are then made adaptive by using the forgetting factor to address market changes. Because stock markets could be highly nonlinear sometimes, the Random Forest is adopted as a nonlinear trading model, and improved with Gradient Boosting to form a new technique—Gradient Boosted Random Forest. All the models are trained and evaluated on nine stocks and one index, and statistical tests such as randomness, linear and nonlinear correlations are conducted on the data to check the statistical significance of the inputs and their relation with the output before a model is trained. Our empirical results show that the proposed trading methods are able to generate excess returns compared with the buy-and-hold strategy.
文摘The car-following models are the research basis of traffic flow theory and microscopic traffic simulation. Among the previous work, the theory-driven models are dominant, while the data-driven ones are relatively rare. In recent years, the related technologies of Intelligent Transportation System (ITS) re</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">presented by the Vehicles to Everything (V2X) technology have been developing rapidly. Utilizing the related technologies of ITS, the large-scale vehicle microscopic trajectory data with high quality can be acquired, which provides the research foundation for modeling the car-following behavior based on the data-driven methods. According to this point, a data-driven car-following model based on the Random Forest (RF) method was constructed in this work, and the Next Generation Simulation (NGSIM) dataset was used to calibrate and train the constructed model. The Artificial Neural Network (ANN) model, GM model, and Full Velocity Difference (FVD) model are em</span><span style="font-family:Verdana;">- </span><span style="font-family:Verdana;">ployed to comparatively verify the proposed model. The research results suggest that the model proposed in this work can accurately describe the car-</span><span style="font-family:Verdana;"> </span><span style="font-family:Verdana;">following behavior with better performance under multiple performance indicators.
基金The National High Technology Research and Development Program of China(863 Program)(No.2012AA112304)the Scientific Innovation Research of College Graduates in Jiangsu Province(No.CXZZ13-0119)
文摘In order to avoid the noise and over fitting and further improve the limited classification performance of the real decision tree, a traffic incident detection method based on the random forest algorithm is presented. From the perspective of classification strength and correlation, three experiments are performed to investigate the potential application of random forest to traffic incident detection: comparison with a different number of decision trees; comparison with different decision trees; comparison with the neural network. The real traffic data of the 1-880 database is used in the experiments. The detection performance is evaluated by the common criteria including the detection rate, the false alarm rate, the mean time to detection, the classification rate and the area under the curve of the receiver operating characteristic (ROC). The experimental results indicate that the model based on random forest can improve the decision rate, reduce the testing time, and obtain a higher classification rate. Meanwhile, it is competitive compared with multi-layer feed forward neural networks (MLF).
基金This research received no specific grant from any funding agency in the public,commercial,or not-for-profit sectors
文摘Height–diameter relationships are essential elements of forest assessment and modeling efforts.In this work,two linear and eighteen nonlinear height–diameter equations were evaluated to find a local model for Oriental beech(Fagus orientalis Lipsky) in the Hyrcanian Forest in Iran.The predictive performance of these models was first assessed by different evaluation criteria: adjusted R^2(R^2_(adj)),root mean square error(RMSE),relative RMSE(%RMSE),bias,and relative bias(%bias) criteria.The best model was selected for use as the base mixed-effects model.Random parameters for test plots were estimated with different tree selection options.Results show that the Chapman–Richards model had better predictive ability in terms of adj R^2(0.81),RMSE(3.7 m),%RMSE(12.9),bias(0.8),%Bias(2.79) than the other models.Furthermore,the calibration response,based on a selection of four trees from the sample plots,resulted in a reduction percentage for bias and RMSE of about 1.6–2.7%.Our results indicate that the calibrated model produced the most accurate results.
基金supported by National High Technology Research and Development Program of China (863 Program. No. 2013AA102402)
文摘As an important non-ferrous metal structural material most used in industry and production,aluminum(Al) alloy shows its great value in the national economy and industrial manufacturing.How to classify Al alloy rapidly and accurately is a significant, popular and meaningful task.Classification methods based on laser-induced breakdown spectroscopy(LIBS) have been reported in recent years. Although LIBS is an advanced detection technology, it is necessary to combine it with some algorithm to reach the goal of rapid and accurate classification. As an important machine learning method, the random forest(RF) algorithm plays a great role in pattern recognition and material classification. This paper introduces a rapid classification method of Al alloy based on LIBS and the RF algorithm. The results show that the best accuracy that can be reached using this method to classify Al alloy samples is 98.59%, the average of which is 98.45%. It also reveals through the relationship laws that the accuracy varies with the number of trees in the RF and the size of the training sample set in the RF. According to the laws, researchers can find out the optimized parameters in the RF algorithm in order to achieve,as expected, a good result. These results prove that LIBS with the RF algorithm can exactly classify Al alloy effectively, precisely and rapidly with high accuracy, which obviously has significant practical value.
文摘针对卡方自动交互诊断(CHAID)决策树易过拟合的问题,提出CHAID随机森林方法(CHAID Random Forest,CHAID-RF)。该方法采用随机采样、随机选择特征以及集成的策略,将CHAID决策树作为基分类器,形成CHAID-RF。为了验证CHAID-RF的有效性,选取CART、CHAID、SVM、RF作为对比算法,以准确率、加权查准率、加权查全率、加权F值作为分类模型评价指标,以均方根误差作为回归模型评价指标,采用10个分类数据集和7个回归数据集进行验证。实验结果表明CHAID-RF可行有效。
文摘Traffic flow prediction,as the basis of signal coordination and travel time prediction,has become a research point in the field of transportation.For traffic flow prediction,researchers have proposed a variety of methods,but most of these methods only use the time domain information of traffic flow data to predict the traffic flow,ignoring the impact of spatial correlation on the prediction of target road segment flow,which leads to poor prediction accuracy.In this paper,a traffic flow prediction model called as long short time memory and random forest(LSTM-RF)was proposed based on the combination model.In the process of traffic flow prediction,the long short time memory(LSTM)model was used to extract the time sequence features of the predicted target road segment.Then,the predicted value of LSTM and the collected information of adjacent upstream and downstream sections were simultaneously used as the input features of the random forest model to analyze the spatial-temporal correlation of traffic flow,so as to obtain the final prediction results.The traffic flow data of 132 urban road sections collected by the license plate recognition system in Guiyang City were tested and verified.The results show that the method is better than the single model in prediction accuracy,and the prediction error is obviously reduced compared with the single model.
基金supported by the Ministry of Higher Education through MRUN Young Researchers Grant Scheme(MY-RGS),MR001-2019,entitled“Climate Change Mitigation:Artificial Intelligence-Based Integrated Environmental System for Mangrove Forest Conservation”and UM-RU Grant,ST065-2021,entitled“Climate-Smart Mitigation and Adaptation:Integrated Climate Resilience Strategy for Tropical Marine Ecosystem.”。
文摘Water quality analysis is essential to understand the ecological status of aquatic life.Conventional water quality index(WQI)assessment methods are limited to features such as water acidic or basicity(pH),dissolved oxygen(DO),biological oxygen demand(BOD),chemical oxygen demand(COD),ammoniacal nitrogen(NH3-N),and suspended solids(SS).These features are often insufficient to represent the water quality of a heavy metal–polluted river.Therefore,this paper aims to explore and analyze novel input features in order to formulate an improved WQI.In this work,prospective insights on the feasibility of alternative water quality input variables as new discriminant features are discussed.The new discriminant features are a step toward formulating adaptive water quality parameters according to the land use activities surrounding the river.The results and analysis obtained from this study have proven the possibility of predicting WQI using new input features.This work analyzes 17 new input features,namely conductivity(COND),salinity(SAL),turbidity(TUR),dissolved solids(DS),nitrate(NO3),chloride(Cl),phosphate(PO4),arsenic(As),chromium(Cr),zinc(Zn),calcium(Ca),iron(Fe),potassium(K),magnesium(Mg),sodium(Na),E.coli,and total coliform,in predicting WQI using machine learning techniques.Five regression algorithms-random forest(RF),AdaBoost,support vector regression(SVR),decision tree regression(DTR),and multilayer perception(MLP)-are applied for preliminary model selection.The results show that the RF algorithm exhibits better prediction performance,with R2 of 0.974.Then,this work proposes a modified RF by incorporating the synthetic minority oversampling technique(SMOTE)into the conventional RF method.The proposed modified RF method is shown to achieve 77.68%,74%,69%,and 71%accuracy,precision,recall,and F1-score,respectively.In addition,the sensitivity analysis is included to highlight the importance of the turbidity variable in WQI prediction.The results of sensitivity analysis highlight the importance of certain water quality variables that are not present in the conventional WQI formulation.