Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is...Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is that the number of communities K must be predefined manually.We propose a flexible nonparametric Bayesian generative model for count-value networks,which can allow K to increase as more and more data are encountered instead of to be fixed in advance.The Indian buffet process was used to model the community assignment matrix Z,and an uncol-lapsed Gibbs sampler has been derived.However,as the community assignment matrix Zis a structured multi-variable parameter,how to summarize the posterior inference results andestimate the inference quality about Z,is still a considerable challenge in the literature.In this paper,a graph convolutional neural network based graph classifier was utilized to help tosummarize the results and to estimate the inference qualityabout Z.We conduct extensive experiments on synthetic data and real data,and find that empirically,the traditional posterior summarization strategy is reliable.展开更多
Objectives: We introduce a special form of the Generalized Poisson Distribution. The distribution has one parameter, yet it has a variance that is larger than the mean a phenomenon known as “over dispersion”. We dis...Objectives: We introduce a special form of the Generalized Poisson Distribution. The distribution has one parameter, yet it has a variance that is larger than the mean a phenomenon known as “over dispersion”. We discuss potential applications of the distribution as a model of counts, and under the assumption of independence we will perform statistical inference on the ratio of two means, with generalization to testing the homogeneity of several means. Methods: Bayesian methods depend on the choice of the prior distributions of the population parameters. In this paper, we describe a Bayesian approach for estimation and inference on the parameters of several independent Inflated Poisson (IPD) distributions with two possible priors, the first is the reciprocal of the square root of the Poisson parameter and the other is a conjugate Gamma prior. The parameters of Gamma distribution are estimated in the empirical Bayesian framework using the maximum likelihood (ML) solution using nonlinear mixed model (NLMIXED) in SAS. With these priors we construct the highest posterior confidence intervals on the ratio of two IPD parameters and test the homogeneity of several populations. Results: We encountered convergence problem in estimating the hyperparameters of the posterior distribution using the NLMIXED. However, direct maximization of the predictive density produced solutions to the maximum likelihood equations. We apply the methodologies to RNA-SEQ read count data of gene expression values.展开更多
基金supported by the National Basic Research Program of China(973)(2012CB316402)The National Natural Science Foundation of China(Grant Nos.61332005,61725205)+3 种基金The Research Project of the North Minzu University(2019XYZJK02,2019xYZJK05,2017KJ24,2017KJ25,2019MS002)Ningxia first-classdisciplinc and scientific research projects(electronic science and technology,NXYLXK2017A07)NingXia Provincial Key Discipline Project-Computer ApplicationThe Provincial Natural Science Foundation ofNingXia(NZ17111,2020AAC03219).
文摘Overlapping community detection has become a very hot research topic in recent decades,and a plethora of methods have been proposed.But,a common challenge in many existing overlapping community detection approaches is that the number of communities K must be predefined manually.We propose a flexible nonparametric Bayesian generative model for count-value networks,which can allow K to increase as more and more data are encountered instead of to be fixed in advance.The Indian buffet process was used to model the community assignment matrix Z,and an uncol-lapsed Gibbs sampler has been derived.However,as the community assignment matrix Zis a structured multi-variable parameter,how to summarize the posterior inference results andestimate the inference quality about Z,is still a considerable challenge in the literature.In this paper,a graph convolutional neural network based graph classifier was utilized to help tosummarize the results and to estimate the inference qualityabout Z.We conduct extensive experiments on synthetic data and real data,and find that empirically,the traditional posterior summarization strategy is reliable.
文摘Objectives: We introduce a special form of the Generalized Poisson Distribution. The distribution has one parameter, yet it has a variance that is larger than the mean a phenomenon known as “over dispersion”. We discuss potential applications of the distribution as a model of counts, and under the assumption of independence we will perform statistical inference on the ratio of two means, with generalization to testing the homogeneity of several means. Methods: Bayesian methods depend on the choice of the prior distributions of the population parameters. In this paper, we describe a Bayesian approach for estimation and inference on the parameters of several independent Inflated Poisson (IPD) distributions with two possible priors, the first is the reciprocal of the square root of the Poisson parameter and the other is a conjugate Gamma prior. The parameters of Gamma distribution are estimated in the empirical Bayesian framework using the maximum likelihood (ML) solution using nonlinear mixed model (NLMIXED) in SAS. With these priors we construct the highest posterior confidence intervals on the ratio of two IPD parameters and test the homogeneity of several populations. Results: We encountered convergence problem in estimating the hyperparameters of the posterior distribution using the NLMIXED. However, direct maximization of the predictive density produced solutions to the maximum likelihood equations. We apply the methodologies to RNA-SEQ read count data of gene expression values.