Water infiltration into soil is an important process in hydrologic cycle;however,its measurement is difficult,time-consuming and costly.Empirical and physical models have been developed to predict cumulative infiltrat...Water infiltration into soil is an important process in hydrologic cycle;however,its measurement is difficult,time-consuming and costly.Empirical and physical models have been developed to predict cumulative infiltration(CI),but are often inaccurate.In this study,several novel standalone machine learning algorithms(M5Prime(M5P),decision stump(DS),and sequential minimal optimization(SMO))and hybrid algorithms based on additive regression(AR)(i.e.,AR-M5P,AR-DS,and AR-SMO)and weighted instance handler wrapper(WIHW)(i.e.,WIHW-M5P,WIHW-DS,and WIHW-SMO)were developed for CI prediction.The Soil Conservation Service(SCS)model developed by the United States Department of Agriculture(USDA),one of the most popular empirical models to predict CI,was considered as a benchmark.Overall,154 measurements of CI(explanatory/input variables)were taken from 16 sites in a semi-arid region of Iran(Illam and Lorestan provinces).Six input variable combinations were considered based on Pearson correlations between candidate model inputs(time of measuring and soil bulk density,moisture content,and sand,clay,and silt percentages)and CI.The dataset was divided into two subgroups at random:70%of the data were used for model building(training dataset)and the remaining 30%were used for model validation(testing dataset).The various models were evaluated using different graphical approaches(bar charts,scatter plots,violin plots,and Taylor diagrams)and quantitative measures(root mean square error(RMSE),mean absolute error(MAE),Nash-Sutcliffe efficiency(NSE),and percent bias(PBIAS)).Time of measuring had the highest correlation with CI in the study area.The best input combinations were different for different algorithms.The results showed that all hybrid algorithms enhanced the CI prediction accuracy compared to the standalone models.The AR-M5P model provided the most accurate CI predictions(RMSE=0.75 cm,MAE=0.59 cm,NSE=0.98),while the SCS model had the lowest performance(RMSE=4.77 cm,MAE=2.64 cm,NSE=0.23).The differences in RMSE between the best model(AR-M5P)and the second-best(WIHW-M5P)and worst(SCS)were 40%and 84%,respectively.展开更多
The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instance weighting technique to improve VDM. An instance weighted value dif...The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instance weighting technique to improve VDM. An instance weighted value difference met- ric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for train- ing instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the dif- ference between the estimated conditional probability P(c/x) by NB and the true conditional probability P(c/x), and the weight is adjusted iteratively. Compared with previous work, IWVDM has the advantage of reducing the time complex- ity of the process of finding weights, and simultaneously im- proving the performance of VDM. Experimental results on 36 UCI datasets validate the effectiveness of IWVDM.展开更多
文摘Water infiltration into soil is an important process in hydrologic cycle;however,its measurement is difficult,time-consuming and costly.Empirical and physical models have been developed to predict cumulative infiltration(CI),but are often inaccurate.In this study,several novel standalone machine learning algorithms(M5Prime(M5P),decision stump(DS),and sequential minimal optimization(SMO))and hybrid algorithms based on additive regression(AR)(i.e.,AR-M5P,AR-DS,and AR-SMO)and weighted instance handler wrapper(WIHW)(i.e.,WIHW-M5P,WIHW-DS,and WIHW-SMO)were developed for CI prediction.The Soil Conservation Service(SCS)model developed by the United States Department of Agriculture(USDA),one of the most popular empirical models to predict CI,was considered as a benchmark.Overall,154 measurements of CI(explanatory/input variables)were taken from 16 sites in a semi-arid region of Iran(Illam and Lorestan provinces).Six input variable combinations were considered based on Pearson correlations between candidate model inputs(time of measuring and soil bulk density,moisture content,and sand,clay,and silt percentages)and CI.The dataset was divided into two subgroups at random:70%of the data were used for model building(training dataset)and the remaining 30%were used for model validation(testing dataset).The various models were evaluated using different graphical approaches(bar charts,scatter plots,violin plots,and Taylor diagrams)and quantitative measures(root mean square error(RMSE),mean absolute error(MAE),Nash-Sutcliffe efficiency(NSE),and percent bias(PBIAS)).Time of measuring had the highest correlation with CI in the study area.The best input combinations were different for different algorithms.The results showed that all hybrid algorithms enhanced the CI prediction accuracy compared to the standalone models.The AR-M5P model provided the most accurate CI predictions(RMSE=0.75 cm,MAE=0.59 cm,NSE=0.98),while the SCS model had the lowest performance(RMSE=4.77 cm,MAE=2.64 cm,NSE=0.23).The differences in RMSE between the best model(AR-M5P)and the second-best(WIHW-M5P)and worst(SCS)were 40%and 84%,respectively.
文摘The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instance weighting technique to improve VDM. An instance weighted value difference met- ric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for train- ing instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the dif- ference between the estimated conditional probability P(c/x) by NB and the true conditional probability P(c/x), and the weight is adjusted iteratively. Compared with previous work, IWVDM has the advantage of reducing the time complex- ity of the process of finding weights, and simultaneously im- proving the performance of VDM. Experimental results on 36 UCI datasets validate the effectiveness of IWVDM.