A diagnostic procedure based on maximum likelihood estimation, to study the convergence of the Markov chain produced by Gibbs sampler, is presented. The unbiasedness, consistent and asymptotic normality are considered...A diagnostic procedure based on maximum likelihood estimation, to study the convergence of the Markov chain produced by Gibbs sampler, is presented. The unbiasedness, consistent and asymptotic normality are considered for the estimation of the parameters produced by the procedure. An example is provided to illustrate the procedure, and the numerical result is consistent with the theoretical one.展开更多
Based on the convergence rate defined by the Pearson-χ~2 distance,this pa- per discusses properties of different Gibbs sampling schemes.Under a set of regularity conditions,it is proved in this paper that the rate of...Based on the convergence rate defined by the Pearson-χ~2 distance,this pa- per discusses properties of different Gibbs sampling schemes.Under a set of regularity conditions,it is proved in this paper that the rate of convergence on systematic scan Gibbs samplers is the norm of a forward operator.We also discuss that the collapsed Gibbs sam- pler has a faster convergence rate than the systematic scan Gibbs sampler as proposed by Liu et al.Based on the definition of convergence rate of the Pearson-χ~2 distance, this paper proved this result quantitatively.According to Theorem 2,we also proved that the convergence rate defined with the spectral radius of matrix by Robert and Shau is equivalent to the corresponding radius of the forward operator.展开更多
Background Many phenotypes in animal breeding are derived from incomplete measures,especially if they are challenging or expensive to measure precisely.Examples include time-dependent traits such as reproductive statu...Background Many phenotypes in animal breeding are derived from incomplete measures,especially if they are challenging or expensive to measure precisely.Examples include time-dependent traits such as reproductive status,or lifespan.Incomplete measures for these traits result in phenotypes that are subject to left-,interval-and rightcensoring,where phenotypes are only known to fall below an upper bound,between a lower and upper bound,or above a lower bound respectively.Here we compare three methods for deriving phenotypes from incomplete data using age at first elevation(>1 ng/mL)in blood plasma progesterone(AGEP4),which generally coincides with onset of puberty,as an example trait.Methods We produced AGEP4 phenotypes from three blood samples collected at about 30-day intervals from approximately 5,000 Holstein–Friesian or Holstein–Friesian×Jersey cross-bred dairy heifers managed in 54 seasonal-calving,pasture-based herds in New Zealand.We used these actual data to simulate 7 different visit scenarios,increasing the extent of censoring by disregarding data from one or two of the three visits.Three methods for deriving phenotypes from these data were explored:1)ordinal categorical variables which were analysed using categorical threshold analysis;2)continuous variables,with a penalty of 31 d assigned to right-censored phenotypes;and 3)continuous variables,sampled from within a lower and upper bound using a data augmentation approach.Results Credibility intervals for heritability estimations overlapped across all methods and visit scenarios,but estimated heritabilities tended to be higher when left censoring was reduced.For sires with at least 5 daughters,the correlations between estimated breeding values(EBVs)from our three-visit scenario and each reduced data scenario varied by method,ranging from 0.65 to 0.95.The estimated breed effects also varied by method,but breed differences were smaller as phenotype censoring increased.Conclusion Our results indicate that using some methods,phenotypes derived from one observation per offspring for a time-dependent trait such as AGEP4 may provide comparable sire rankings to three observations per offspring.This has implications for the design of large-scale phenotyping initiatives where animal breeders aim to estimate variance parameters and estimated breeding values(EBVs)for phenotypes that are challenging to measure or prohibitively expensive.展开更多
In this letter, the detection of asynchronous DS-CDMA signal with multipath fading and interference from neighboring cells is studied. A novel multiuser detector based on Gibbs sampler is proposed, in which Gibbs samp...In this letter, the detection of asynchronous DS-CDMA signal with multipath fading and interference from neighboring cells is studied. A novel multiuser detector based on Gibbs sampler is proposed, in which Gibbs sampler is employed to perform the Bayesian multiuser detection according to the linear group-blind decorrelator output. Since Gibbs sampler is dependent of parameter estimation that can be improved by the output of the detector, an enhanced Gibbs sampler based detector using the improved parameter estimation is put forward. The novel multiuser detection technique has the advantages of high performance and wide applications. Computer simulations show its effectiveness.展开更多
Recently, a new soft-in soft-out detection algorithm based on the Markov Chain Monte Carlo (MCMC) simulation technique for Multiple-Input Multiple-Output (MIMO) systems is proposed, which is shown to perform significa...Recently, a new soft-in soft-out detection algorithm based on the Markov Chain Monte Carlo (MCMC) simulation technique for Multiple-Input Multiple-Output (MIMO) systems is proposed, which is shown to perform significantly better than their sphere decoding counterparts with relatively low complexity. However, the MCMC simulator is likely to get trapped in a fixed state when the channel SNR is high, thus lots of repetitive samples are observed and the accuracy of A Posteriori Probability (APP) estimation deteriorates. To solve this problem, an improved version of MCMC simulator, named forced-dispersed MCMC algorithm is proposed. Based on the a posteriori variance of each bit, the Gibbs sampler is monitored. Once the trapped state is detected, the sample is dispersed intentionally according to the a posteriori variance. Extensive simulation shows that, compared with the existing solution, the proposed algorithm enables the markov chain to travel more states, which ensures a near-optimal performance.展开更多
To simulate a multivariate density with multi_hump, Markov chainMonte Carlo method encounters the obstacle of escaping from one hump to another, since it usually takes extraordinately long time and then becomes practi...To simulate a multivariate density with multi_hump, Markov chainMonte Carlo method encounters the obstacle of escaping from one hump to another, since it usually takes extraordinately long time and then becomes practically impossible to perform. To overcome these difficulties, a reversible scheme to generate a Markov chain, in terms of which the simulated density may be successful in rather general cases of practically avoiding being trapped in local humps, was suggested.展开更多
Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is...Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is that the number of communities K must be predefined manually.We propose a flexible nonparametric Bayesian generative model for count-value networks,which can allow K to increase as more and more data are encountered instead of to be fixed in advance.The Indian buffet process was used to model the community assignment matrix Z,and an uncol-lapsed Gibbs sampler has been derived.However,as the community assignment matrix Zis a structured multi-variable parameter,how to summarize the posterior inference results andestimate the inference quality about Z,is still a considerable challenge in the literature.In this paper,a graph convolutional neural network based graph classifier was utilized to help tosummarize the results and to estimate the inference qualityabout Z.We conduct extensive experiments on synthetic data and real data,and find that empirically,the traditional posterior summarization strategy is reliable.展开更多
Recently, variable selection based on penalized regression methods has received a great deal of attention, mostly through frequentist's models. This paper investigates regularization regression from Bayesian perspect...Recently, variable selection based on penalized regression methods has received a great deal of attention, mostly through frequentist's models. This paper investigates regularization regression from Bayesian perspective. Our new method extends the Bayesian Lasso regression (Park and Casella, 2008) through replacing the least square loss and Lasso penalty by composite quantile loss function and adaptive Lasso penalty, which allows different penalization parameters for different regression coefficients. Based on the Bayesian hierarchical model framework, an efficient Gibbs sampler is derived to simulate the parameters from posterior distributions. Furthermore, we study the Bayesian composite quantile regression with adaptive group Lasso penalty. The distinguishing characteristic of the newly proposed method is completely data adaptive without requiring prior knowledge of the error distribution. Extensive simulations and two real data examples are used to examine the good performance of the proposed method. All results confirm that our novel method has both robustness and high efficiency and often outperforms other approaches.展开更多
The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random,and assign them a prior.Selecting a suitable p...The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random,and assign them a prior.Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting.As the distribution of distribution,Dirichlet process(DP)is the most appreciated nonparametric prior due to its nice theoretical proprieties,modeling flexibility and computational feasibility.In this paper,we review and summarize some developments of DP during the past decades.Our focus is mainly concentrated upon its theoretical properties,various extensions,statistical modeling and applications to the latent variable models.展开更多
Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data au...Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data augmentation (DA) algorithm and a non-iterative inverse Bayes formulae (IBF) sampler, to simulate TMVND and generalize them to multivariate normal distributions with linear inequality constraints. By creating a Bayesian incomplete-data structure, the posterior step of the DA Mgorithm directly generates random vector draws as opposed to single element draws, resulting obvious computational advantage and easy coding with common statistical software packages such as S-PLUS, MATLAB and GAUSS. Furthermore, the DA provides a ready structure for implementing a fast EM algorithm to identify the mode of TMVND, which has many potential applications in statistical inference of constrained parameter problems. In addition, utilizing this mode as an intermediate result, the IBF sampling provides a novel alternative to Gibbs sampling and elimi- nares problems with convergence and possible slow convergence due to the high correlation between components of a TMVND. The DA algorithm is applied to a linear regression model with constrained parameters and is illustrated with a published data set. Numerical comparisons show that the proposed DA algorithm and IBF sampler are more efficient than the Gibbs sampler and the accept-reject algorithm.展开更多
The main goal of this paper is to employ longitudinal trajectories in a significant number of sub-regional brain volumetric MRI data as statistical predictors for Alzheimer's disease(AD)clas-sification.We use logi...The main goal of this paper is to employ longitudinal trajectories in a significant number of sub-regional brain volumetric MRI data as statistical predictors for Alzheimer's disease(AD)clas-sification.We use logistic regression in a Bayesian framework that includes many functional predictors.The direct sampling of regression Coefficients from the Bayesian logistic model is dif-ficult due to its complicated likelihood function.In high-dimensional scenarios,the selection of predictors is paramount with the introduction of either spike-and-slab priors,non-local priors,or Horseshoe priors.We seek to avoid the complicated Metropolis-Hastings approach and to develop an easily implementable Gibbs sampler.In addition,the Bayesian estimation provides proper estimates of the model parameters,which are also useful for building inference.Another advantage of working with logistic regression is that it calculates the log of odds of relative risk for AD compared to normal control based on the selected longitudinal predictors,rather than simply classifying patients based on cross-sectional estimates.Ultimately,however,we com-bine approaches and use a probability threshold to classify individual patients.We employ 49 functional predictors consisting of volumetric estimates of brain sub-regions,chosen for their established clinical significance.Moreover,the use of spike and slab priors ensures that many redundant predictors are dropped from the model.展开更多
文摘A diagnostic procedure based on maximum likelihood estimation, to study the convergence of the Markov chain produced by Gibbs sampler, is presented. The unbiasedness, consistent and asymptotic normality are considered for the estimation of the parameters produced by the procedure. An example is provided to illustrate the procedure, and the numerical result is consistent with the theoretical one.
基金supported by the National Natural Science Foundation of China(Grant No.10431010)National Basic Research Project,973(Grant No.2003CB715900).
文摘Based on the convergence rate defined by the Pearson-χ~2 distance,this pa- per discusses properties of different Gibbs sampling schemes.Under a set of regularity conditions,it is proved in this paper that the rate of convergence on systematic scan Gibbs samplers is the norm of a forward operator.We also discuss that the collapsed Gibbs sam- pler has a faster convergence rate than the systematic scan Gibbs sampler as proposed by Liu et al.Based on the definition of convergence rate of the Pearson-χ~2 distance, this paper proved this result quantitatively.According to Theorem 2,we also proved that the convergence rate defined with the spectral radius of matrix by Robert and Shau is equivalent to the corresponding radius of the forward operator.
基金funded by New Zealand dairy farmers through Dairy NZ Inc. and by the New Zealand Ministry of Business,Innovation and Employment (DRCX1302)support was kindly received from LIC and CRV
文摘Background Many phenotypes in animal breeding are derived from incomplete measures,especially if they are challenging or expensive to measure precisely.Examples include time-dependent traits such as reproductive status,or lifespan.Incomplete measures for these traits result in phenotypes that are subject to left-,interval-and rightcensoring,where phenotypes are only known to fall below an upper bound,between a lower and upper bound,or above a lower bound respectively.Here we compare three methods for deriving phenotypes from incomplete data using age at first elevation(>1 ng/mL)in blood plasma progesterone(AGEP4),which generally coincides with onset of puberty,as an example trait.Methods We produced AGEP4 phenotypes from three blood samples collected at about 30-day intervals from approximately 5,000 Holstein–Friesian or Holstein–Friesian×Jersey cross-bred dairy heifers managed in 54 seasonal-calving,pasture-based herds in New Zealand.We used these actual data to simulate 7 different visit scenarios,increasing the extent of censoring by disregarding data from one or two of the three visits.Three methods for deriving phenotypes from these data were explored:1)ordinal categorical variables which were analysed using categorical threshold analysis;2)continuous variables,with a penalty of 31 d assigned to right-censored phenotypes;and 3)continuous variables,sampled from within a lower and upper bound using a data augmentation approach.Results Credibility intervals for heritability estimations overlapped across all methods and visit scenarios,but estimated heritabilities tended to be higher when left censoring was reduced.For sires with at least 5 daughters,the correlations between estimated breeding values(EBVs)from our three-visit scenario and each reduced data scenario varied by method,ranging from 0.65 to 0.95.The estimated breed effects also varied by method,but breed differences were smaller as phenotype censoring increased.Conclusion Our results indicate that using some methods,phenotypes derived from one observation per offspring for a time-dependent trait such as AGEP4 may provide comparable sire rankings to three observations per offspring.This has implications for the design of large-scale phenotyping initiatives where animal breeders aim to estimate variance parameters and estimated breeding values(EBVs)for phenotypes that are challenging to measure or prohibitively expensive.
文摘In this letter, the detection of asynchronous DS-CDMA signal with multipath fading and interference from neighboring cells is studied. A novel multiuser detector based on Gibbs sampler is proposed, in which Gibbs sampler is employed to perform the Bayesian multiuser detection according to the linear group-blind decorrelator output. Since Gibbs sampler is dependent of parameter estimation that can be improved by the output of the detector, an enhanced Gibbs sampler based detector using the improved parameter estimation is put forward. The novel multiuser detection technique has the advantages of high performance and wide applications. Computer simulations show its effectiveness.
文摘Recently, a new soft-in soft-out detection algorithm based on the Markov Chain Monte Carlo (MCMC) simulation technique for Multiple-Input Multiple-Output (MIMO) systems is proposed, which is shown to perform significantly better than their sphere decoding counterparts with relatively low complexity. However, the MCMC simulator is likely to get trapped in a fixed state when the channel SNR is high, thus lots of repetitive samples are observed and the accuracy of A Posteriori Probability (APP) estimation deteriorates. To solve this problem, an improved version of MCMC simulator, named forced-dispersed MCMC algorithm is proposed. Based on the a posteriori variance of each bit, the Gibbs sampler is monitored. Once the trapped state is detected, the sample is dispersed intentionally according to the a posteriori variance. Extensive simulation shows that, compared with the existing solution, the proposed algorithm enables the markov chain to travel more states, which ensures a near-optimal performance.
文摘To simulate a multivariate density with multi_hump, Markov chainMonte Carlo method encounters the obstacle of escaping from one hump to another, since it usually takes extraordinately long time and then becomes practically impossible to perform. To overcome these difficulties, a reversible scheme to generate a Markov chain, in terms of which the simulated density may be successful in rather general cases of practically avoiding being trapped in local humps, was suggested.
基金supported by the National Basic Research Program of China(973)(2012CB316402)The National Natural Science Foundation of China(Grant Nos.61332005,61725205)+3 种基金The Research Project of the North Minzu University(2019XYZJK02,2019xYZJK05,2017KJ24,2017KJ25,2019MS002)Ningxia first-classdisciplinc and scientific research projects(electronic science and technology,NXYLXK2017A07)NingXia Provincial Key Discipline Project-Computer ApplicationThe Provincial Natural Science Foundation ofNingXia(NZ17111,2020AAC03219).
文摘Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is that the number of communities K must be predefined manually.We propose a flexible nonparametric Bayesian generative model for count-value networks,which can allow K to increase as more and more data are encountered instead of to be fixed in advance.The Indian buffet process was used to model the community assignment matrix Z,and an uncol-lapsed Gibbs sampler has been derived.However,as the community assignment matrix Zis a structured multi-variable parameter,how to summarize the posterior inference results andestimate the inference quality about Z,is still a considerable challenge in the literature.In this paper,a graph convolutional neural network based graph classifier was utilized to help tosummarize the results and to estimate the inference qualityabout Z.We conduct extensive experiments on synthetic data and real data,and find that empirically,the traditional posterior summarization strategy is reliable.
基金supported in part by National Natural Science Foundation of China(11501372,11571112)Project of National Social Science Fund(15BTJ027)+3 种基金Doctoral Fund of Ministry of Education of China(20130076110004)Program of Shanghai Subject Chief Scientist(14XD1401600)the 111 Project of China(B14019)Natural Science Fund of Nantong University(14B28)
文摘Recently, variable selection based on penalized regression methods has received a great deal of attention, mostly through frequentist's models. This paper investigates regularization regression from Bayesian perspective. Our new method extends the Bayesian Lasso regression (Park and Casella, 2008) through replacing the least square loss and Lasso penalty by composite quantile loss function and adaptive Lasso penalty, which allows different penalization parameters for different regression coefficients. Based on the Bayesian hierarchical model framework, an efficient Gibbs sampler is derived to simulate the parameters from posterior distributions. Furthermore, we study the Bayesian composite quantile regression with adaptive group Lasso penalty. The distinguishing characteristic of the newly proposed method is completely data adaptive without requiring prior knowledge of the error distribution. Extensive simulations and two real data examples are used to examine the good performance of the proposed method. All results confirm that our novel method has both robustness and high efficiency and often outperforms other approaches.
基金supported in part by the National Natural Science Foundation of China(Grant No.11471161)the Technological Innovation Item in Jiangsu Province(No.BK2008156).
文摘The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random,and assign them a prior.Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting.As the distribution of distribution,Dirichlet process(DP)is the most appreciated nonparametric prior due to its nice theoretical proprieties,modeling flexibility and computational feasibility.In this paper,we review and summarize some developments of DP during the past decades.Our focus is mainly concentrated upon its theoretical properties,various extensions,statistical modeling and applications to the latent variable models.
基金Supported by the National Social Science Foundation of China (No. 09BTJ012)Scientific Research Fund ofHunan Provincial Education Department (No. 09c390)+1 种基金supported in part by a HKUSeed Funding Program for Basic Research (Project No. 2009-1115-9042)a grant from Hong Kong ResearchGrant Council-General Research Fund (Project No. HKU779210M)
文摘Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data augmentation (DA) algorithm and a non-iterative inverse Bayes formulae (IBF) sampler, to simulate TMVND and generalize them to multivariate normal distributions with linear inequality constraints. By creating a Bayesian incomplete-data structure, the posterior step of the DA Mgorithm directly generates random vector draws as opposed to single element draws, resulting obvious computational advantage and easy coding with common statistical software packages such as S-PLUS, MATLAB and GAUSS. Furthermore, the DA provides a ready structure for implementing a fast EM algorithm to identify the mode of TMVND, which has many potential applications in statistical inference of constrained parameter problems. In addition, utilizing this mode as an intermediate result, the IBF sampling provides a novel alternative to Gibbs sampling and elimi- nares problems with convergence and possible slow convergence due to the high correlation between components of a TMVND. The DA algorithm is applied to a linear regression model with constrained parameters and is illustrated with a published data set. Numerical comparisons show that the proposed DA algorithm and IBF sampler are more efficient than the Gibbs sampler and the accept-reject algorithm.
基金This work was supported by Directorate for Mathematical and Physical Sciences[1924724].
文摘The main goal of this paper is to employ longitudinal trajectories in a significant number of sub-regional brain volumetric MRI data as statistical predictors for Alzheimer's disease(AD)clas-sification.We use logistic regression in a Bayesian framework that includes many functional predictors.The direct sampling of regression Coefficients from the Bayesian logistic model is dif-ficult due to its complicated likelihood function.In high-dimensional scenarios,the selection of predictors is paramount with the introduction of either spike-and-slab priors,non-local priors,or Horseshoe priors.We seek to avoid the complicated Metropolis-Hastings approach and to develop an easily implementable Gibbs sampler.In addition,the Bayesian estimation provides proper estimates of the model parameters,which are also useful for building inference.Another advantage of working with logistic regression is that it calculates the log of odds of relative risk for AD compared to normal control based on the selected longitudinal predictors,rather than simply classifying patients based on cross-sectional estimates.Ultimately,however,we com-bine approaches and use a probability threshold to classify individual patients.We employ 49 functional predictors consisting of volumetric estimates of brain sub-regions,chosen for their established clinical significance.Moreover,the use of spike and slab priors ensures that many redundant predictors are dropped from the model.