A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The b...A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.展开更多
In a typical Kenyan HIV clinical setting, there is a likelihood of registering many zeros during the routine monthly data collection of new HIV infections among HIV exposed infants (HEI). This is attributed to the imp...In a typical Kenyan HIV clinical setting, there is a likelihood of registering many zeros during the routine monthly data collection of new HIV infections among HIV exposed infants (HEI). This is attributed to the implementation of the prevention of mother to child transmission (PMTCT) policies. However, even though the PMTCT policy is implemented uniformly across all public health facilities, implementation naturally differs from every facility due to differential health systems and infrastructure. This leads to structured zero among reported positive HEI (where PMTCT implementation is optimum) and non-structured zero among reported positive HEI (where PMTCT implementation is not optimum). Hence the classical zero-inflated and hurdle models that do not account for the abundance of structured and non-structured zeros in the data can give misleading results. The purpose of this study is to systematically compare performance of the various zero-inflated models with an application to HIV Exposed Infants (HEI) in the context of structured and unstructured zeros. We revisit zero-inflated, hurdle models, Poisson and negative binomial count models and conduct the simulations by varying sample size and levels of abundance zeros. Results from simulation study and real data analysis of exposed infant diagnosis show the negative binomial emerging as the best performing model when fitting data with both structured and non-structured zeros under various settings.展开更多
We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Beside...We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.展开更多
Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust ...Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei. shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.展开更多
校园网的建设随着教育信息化的发展而普及,但是目前缺乏有关专业方面的标准和规范,造成已建成的校园内子系统之间或校与校管理系统之间兼容性较差或难以兼容,因此,也难以实现相互之间信息和资源的交流和共享。正是在这种背景下,重点研...校园网的建设随着教育信息化的发展而普及,但是目前缺乏有关专业方面的标准和规范,造成已建成的校园内子系统之间或校与校管理系统之间兼容性较差或难以兼容,因此,也难以实现相互之间信息和资源的交流和共享。正是在这种背景下,重点研究了教育部推出的教育管理信息系统的互操作规范(Education Management Information System Interoperability Framework,EMIF),采用了该规范的特点,引入集群的概念,使用客户端代理(Agent)和区域集成服务器(Zone Integration Server,简称ZIS)模式构建分布式异构环境下,基于EMIF的信息共享和数据集成平台。展开更多
A new two-parameter count distribution is derived starting with probabilistic arguments around the gamma function and the digamma function. This model is a generalization of the Poisson model with a noteworthy assortm...A new two-parameter count distribution is derived starting with probabilistic arguments around the gamma function and the digamma function. This model is a generalization of the Poisson model with a noteworthy assortment of qualities. For example, the mean is the main model parameter;any possible non-trivial variance or zero probability can be attained by changing the other model parameter;and all distributions are visually natural-shaped. Thus, exact modeling to any degree of over/under-dispersion or zero-inflation/deflation is possible.展开更多
Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key chall...Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.展开更多
基金support from the National Science,Research and Innovation Fund (NSRF)King Mongkut’s University of Technology North Bangkok (Grant No.KMUTNB-FF-65-22).
文摘A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.
文摘In a typical Kenyan HIV clinical setting, there is a likelihood of registering many zeros during the routine monthly data collection of new HIV infections among HIV exposed infants (HEI). This is attributed to the implementation of the prevention of mother to child transmission (PMTCT) policies. However, even though the PMTCT policy is implemented uniformly across all public health facilities, implementation naturally differs from every facility due to differential health systems and infrastructure. This leads to structured zero among reported positive HEI (where PMTCT implementation is optimum) and non-structured zero among reported positive HEI (where PMTCT implementation is not optimum). Hence the classical zero-inflated and hurdle models that do not account for the abundance of structured and non-structured zeros in the data can give misleading results. The purpose of this study is to systematically compare performance of the various zero-inflated models with an application to HIV Exposed Infants (HEI) in the context of structured and unstructured zeros. We revisit zero-inflated, hurdle models, Poisson and negative binomial count models and conduct the simulations by varying sample size and levels of abundance zeros. Results from simulation study and real data analysis of exposed infant diagnosis show the negative binomial emerging as the best performing model when fitting data with both structured and non-structured zeros under various settings.
基金The proposed COM-negative binomial distribution of this work was as early as conceptualized in December, 2014 when the authors saw the online version of [15]. The authors want to thank Prof. R. KShler for mailing the valuable encyclopedia of discrete univariate distributions [39] to them. This work was partly supported by the National Natural Science Foundation of China (Grant No. 11201165).
文摘We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.
文摘Background: The recently emerged technology of methylated RNA immunoprecipitation sequencing (MeRIP-seq) sheds light on the study of RNA epigenetics. This new bioinformatics question calls for effective and robust peaking calling algorithms to detect mRNA methylation sites from MeRIP-seq data. Methods: We propose a Bayesian hierarchical model to detect methylation sites from MeRIP-seq data. Our modeling approach includes several important characteristics. First, it models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model. Second, it incorporates a hidden Markov model (HMM) to account for the spatial dependency of neighboring read enrichment. Third, our Bayesian inference allows the proposed model to borrow strength in parameter estimation, which greatly improves the model stability when dealing with MeRIP-seq data with a small number of replicates. We use Markov chain Monte Carlo (MCMC) algorithms to simultaneously infer the model parameters in a de novo fashion. The R Shiny demo is available at https://qiwei. shinyapps.io/BaySeqPeak and the R/C ++ code is available at https://github.com/liqiwei2000/BaySeqPeak. Results: In simulation studies, the proposed method outperformed the competing methods exomePeak and MeTPeak, especially when an excess of zeros were present in the data. In real MeRIP-seq data analysis, the proposed method identified methylation sites that were more consistent with biological knowledge, and had better spatial resolution compared to the other methods. Conclusions: In this study, we develop a Bayesian hierarchical model to identify methylation peaks in MeRIP-seq data. The proposed method has a competitive edge over existing methods in terms of accuracy, robustness and spatial resolution.
文摘校园网的建设随着教育信息化的发展而普及,但是目前缺乏有关专业方面的标准和规范,造成已建成的校园内子系统之间或校与校管理系统之间兼容性较差或难以兼容,因此,也难以实现相互之间信息和资源的交流和共享。正是在这种背景下,重点研究了教育部推出的教育管理信息系统的互操作规范(Education Management Information System Interoperability Framework,EMIF),采用了该规范的特点,引入集群的概念,使用客户端代理(Agent)和区域集成服务器(Zone Integration Server,简称ZIS)模式构建分布式异构环境下,基于EMIF的信息共享和数据集成平台。
文摘A new two-parameter count distribution is derived starting with probabilistic arguments around the gamma function and the digamma function. This model is a generalization of the Poisson model with a noteworthy assortment of qualities. For example, the mean is the main model parameter;any possible non-trivial variance or zero probability can be attained by changing the other model parameter;and all distributions are visually natural-shaped. Thus, exact modeling to any degree of over/under-dispersion or zero-inflation/deflation is possible.
基金This work was partly supported by the National Natural Science Foundation of China(Grant No.61772460)Ten Thousand Talent Program of Zhejiang Province(2018R52039).
文摘Crime risk prediction is helpful for urban safety and citizens’life quality.However,existing crime studies focused on coarse-grained prediction,and usually failed to capture the dynamics of urban crimes.The key challenge is data sparsity,since that 1)not all crimes have been recorded,and 2)crimes usually occur with low frequency.In this paper,we propose an effective framework to predict fine-grained and dynamic crime risks in each road using heterogeneous urban data.First,to address the issue of unreported crimes,we propose a cross-aggregation soft-impute(CASI)method to deal with possible unreported crimes.Then,we use a novel crime risk measurement to capture the crime dynamics from the perspective of influence propagation,taking into consideration of both time-varying and location-varying risk propagation.Based on the dynamically calculated crime risks,we design contextual features(i.e.,POI distributions,taxi mobility,demographic features)from various urban data sources,and propose a zero-inflated negative binomial regression(ZINBR)model to predict future crime risks in roads.The experiments using the real-world data from New York City show that our framework can accurately predict road crime risks,and outperform other baseline methods.