Determining soilewater characteristic curve(SWCC) at a site is an essential step for implementing unsaturated soil mechanics in geotechnical engineering practice, which can be measured directly through various in-situ...Determining soilewater characteristic curve(SWCC) at a site is an essential step for implementing unsaturated soil mechanics in geotechnical engineering practice, which can be measured directly through various in-situ and/or laboratory tests. Such direct measurements are, however, costly and timeconsuming due to high standards for equipment and procedural control and limits in testing apparatus. As a result, only a limited number of data points(e.g., volumetric water content vs. matric suction)on SWCC at some values of matric suction are obtained in practice. How to use a limited number of data points to estimate the site-specific SWCC and to quantify the uncertainty(or degrees-of-belief) in the estimated SWCC remains a challenging task. This paper proposes a Bayesian approach to determine a site-specific SWCC based on a limited number of test data and prior knowledge(e.g., engineering experience and judgment). The proposed Bayesian approach quantifies the degrees-of-belief on the estimated SWCC according to site-specific test data and prior knowledge, and simultaneously selects a suitable SWCC model from a number of candidates based on the probability logic. To address computational issues involved in Bayesian analyses, Markov Chain Monte Carlo Simulation(MCMCS), specifically Metropolis-Hastings(M-H) algorithm, is used to solve the posterior distribution of SWCC model parameters, and Gaussian copula is applied to evaluating model evidence based on MCMCS samples for selecting the most probable SWCC model from a pool of candidates. This removes one key limitation of the M-H algorithm, making it feasible in Bayesian model selection problems. The proposed approach is illustrated using real data in Unsaturated Soil Database(UNSODA) developed by U.S. Department of Agriculture. It is shown that the proposed approach properly estimates the SWCC based on a limited number of site-specific test data and prior knowledge, and reflects the degrees-of-belief on the estimated SWCC in a rational and quantitative manner.展开更多
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse mult...Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.展开更多
基金supported by the National Key Research and Development Program of China(Project No.2016YFC0800208)the National Natural Science Foundation of China(Project Nos.51329901,51528901,51579190,51779189)
文摘Determining soilewater characteristic curve(SWCC) at a site is an essential step for implementing unsaturated soil mechanics in geotechnical engineering practice, which can be measured directly through various in-situ and/or laboratory tests. Such direct measurements are, however, costly and timeconsuming due to high standards for equipment and procedural control and limits in testing apparatus. As a result, only a limited number of data points(e.g., volumetric water content vs. matric suction)on SWCC at some values of matric suction are obtained in practice. How to use a limited number of data points to estimate the site-specific SWCC and to quantify the uncertainty(or degrees-of-belief) in the estimated SWCC remains a challenging task. This paper proposes a Bayesian approach to determine a site-specific SWCC based on a limited number of test data and prior knowledge(e.g., engineering experience and judgment). The proposed Bayesian approach quantifies the degrees-of-belief on the estimated SWCC according to site-specific test data and prior knowledge, and simultaneously selects a suitable SWCC model from a number of candidates based on the probability logic. To address computational issues involved in Bayesian analyses, Markov Chain Monte Carlo Simulation(MCMCS), specifically Metropolis-Hastings(M-H) algorithm, is used to solve the posterior distribution of SWCC model parameters, and Gaussian copula is applied to evaluating model evidence based on MCMCS samples for selecting the most probable SWCC model from a pool of candidates. This removes one key limitation of the M-H algorithm, making it feasible in Bayesian model selection problems. The proposed approach is illustrated using real data in Unsaturated Soil Database(UNSODA) developed by U.S. Department of Agriculture. It is shown that the proposed approach properly estimates the SWCC based on a limited number of site-specific test data and prior knowledge, and reflects the degrees-of-belief on the estimated SWCC in a rational and quantitative manner.
基金supported by the National Key R&D Program of China(Project No.2016YFC0800200)the NRF-NSFC 3rd Joint Research Grant(Earth Science)(Project No.41861144022)+2 种基金the National Natural Science Foundation of China(Project Nos.51679174,and 51779189)the Shenzhen Key Technology R&D Program(Project No.20170324)The financial support is grateful acknowledged。
文摘Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.