A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The b...A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.展开更多
The mortality of trees across diameter class model is a useful tool for predicting changes in stand structure.Mortality data commonly contain a large fraction of zeros and general discrete models thus show more errors...The mortality of trees across diameter class model is a useful tool for predicting changes in stand structure.Mortality data commonly contain a large fraction of zeros and general discrete models thus show more errors.Based on the traditional Poisson model and the negative binomial model,different forms of zero-inflated and hurdle models were applied to spruce-fir mixed forests data to simulate the number of dead trees.By comparing the residuals and Vuong test statistics,the zero-inflated negative binomial model performed best.A random effect was added to improve the model accuracy;however,the mixed-effects zero-inflated model did not show increased advantages.According to the model principle,the zeroinflated negative binomial model was the most suitable,indicating that the"0"events in this study,mainly from the sample"0",i.e.,the zero mortality data,are largely due to the limitations of the experimental design and sample selection.These results also show that the number of dead trees in the diameter class is positively correlated with the number of trees in that class and the mean stand diameter,and inversely related to class size,and slope and aspect of the site.展开更多
Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to compl...Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.展开更多
Many researchers have discussed zero-inflated univariate distributions. These univariate models are not suitable, for modeling events that involve different types of counts or defects. To model several types of defect...Many researchers have discussed zero-inflated univariate distributions. These univariate models are not suitable, for modeling events that involve different types of counts or defects. To model several types of defects, multivariate Poisson model is one of the appropriate model. This can further be modified to incorporate inflation at zero and we can have multivariate zero-inflated Poisson distribution. In the present article, we introduce a new Bivariate Zero Inflated Power Series Distribution and discuss inference related to the parameters involved in the model. We also discuss the inference related to Bivariate Zero Inflated Poisson Distribution. The model has been applied to a real life data. Extension to k-variate zero inflated power series distribution is also discussed.展开更多
This paper discusses the estimation of parameters in the zero-inflated Poisson (ZIP) model by the method of moments. The method of moments estimators (MMEs) are analytically compared with the maximum likelihood estima...This paper discusses the estimation of parameters in the zero-inflated Poisson (ZIP) model by the method of moments. The method of moments estimators (MMEs) are analytically compared with the maximum likelihood estimators (MLEs). The results of a modest simulation study are presented.展开更多
In a typical Kenyan HIV clinical setting, there is a likelihood of registering many zeros during the routine monthly data collection of new HIV infections among HIV exposed infants (HEI). This is attributed to the imp...In a typical Kenyan HIV clinical setting, there is a likelihood of registering many zeros during the routine monthly data collection of new HIV infections among HIV exposed infants (HEI). This is attributed to the implementation of the prevention of mother to child transmission (PMTCT) policies. However, even though the PMTCT policy is implemented uniformly across all public health facilities, implementation naturally differs from every facility due to differential health systems and infrastructure. This leads to structured zero among reported positive HEI (where PMTCT implementation is optimum) and non-structured zero among reported positive HEI (where PMTCT implementation is not optimum). Hence the classical zero-inflated and hurdle models that do not account for the abundance of structured and non-structured zeros in the data can give misleading results. The purpose of this study is to systematically compare performance of the various zero-inflated models with an application to HIV Exposed Infants (HEI) in the context of structured and unstructured zeros. We revisit zero-inflated, hurdle models, Poisson and negative binomial count models and conduct the simulations by varying sample size and levels of abundance zeros. Results from simulation study and real data analysis of exposed infant diagnosis show the negative binomial emerging as the best performing model when fitting data with both structured and non-structured zeros under various settings.展开更多
Zero-Inflated Poisson model has found a wide variety of applications in recent years in statistical analyses of count data, especially in count regression models. Zero-Inflated Poisson model is characterized in this p...Zero-Inflated Poisson model has found a wide variety of applications in recent years in statistical analyses of count data, especially in count regression models. Zero-Inflated Poisson model is characterized in this paper through a linear differential equation satisfied by its probability generating function [1] [2].展开更多
Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-v...Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.展开更多
Zero-inflated negative binomial distribution is characterized in this paper through a linear differential equation satisfied by its probability generating function.
Empirical estimates of power and Type I error can be misleading if a statistical test does not perform at the stated rejection level under the null hypothesis. We employed the permutation test to control the empirical...Empirical estimates of power and Type I error can be misleading if a statistical test does not perform at the stated rejection level under the null hypothesis. We employed the permutation test to control the empirical type I errors for zero-inflated exponential distributions. The simulation results indicated that the permutation test can be used effectively to control the type I errors near the nominal level even the sample sizes are small based on four statistical tests. Our results attest to the permutation test being a valuable adjunct to the current statistical methods for comparing distributions with underlying zero-inflated data structures.展开更多
A new two-parameter count distribution is derived starting with probabilistic arguments around the gamma function and the digamma function. This model is a generalization of the Poisson model with a noteworthy assortm...A new two-parameter count distribution is derived starting with probabilistic arguments around the gamma function and the digamma function. This model is a generalization of the Poisson model with a noteworthy assortment of qualities. For example, the mean is the main model parameter;any possible non-trivial variance or zero probability can be attained by changing the other model parameter;and all distributions are visually natural-shaped. Thus, exact modeling to any degree of over/under-dispersion or zero-inflation/deflation is possible.展开更多
The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six gene...The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six generalized linear models to examine the relationship between the occurrence of lightning-induced forest fires and meteorological factors in the Northern Daxing'an Mountains of China. The six models included Poisson, negative binomial (NB), zero- inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH), and negative binomial hurdle (NBH) models. Goodness-of-fit was compared and tested among the six models using Akaike information criterion (AIC), sum of squared errors, likelihood ratio test, and Vuong test. The predictive performance of the models was assessed and compared using independent validation data by the data-splitting method. Based on the model AIC, the ZINB model best fitted the fire occurrence data, followed by (in order of smaller AIC) NBH, ZIP, NB, PH, and Poisson models. The ZINB model was also best for pre- dicting either zero counts or positive counts (〉1). The two Hurdle models (PH and NBH) were better than ZIP, Poisson, and NB models for predicting positive counts, but worse than these three models for predicting zero counts. Thus, the ZINB model was the first choice for modeling the occurrence of lightning-induced forest fires in this study, which implied that the excessive zero counts of lightning- induced fires came from both structure and sampling zeros.展开更多
Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the freq...Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the frequency of that use during the period 2000-2008. Study Design: The study is a cross-sectional analytical study using 2000 and 2008 Egypt Demographic and Health Surveys. Methods: The assessment of the studied impact is conducted using the Zero-inflated Negative Binomial Regression. In addition, Factor Analysis technique is used to construct some of the explanatory variables such as women’s empowerment, the availability and quality of health services indicators. Results: Utilization of antenatal health care services is greatly improved from 2000 to 2008. Availability of health services is one of the main determinants that affect the number of antenatal care visits in 2008. Wealth index and quality of health services play an important role in raising the level of antenatal care utilization in 2000 and 2008. However, the impact of the terminated pregnancy on receiving ANC increased over time. Conclusions: Further research of the determinants of antenatal health care utilization is needed, using more updated measures of women’s empowerment, availability and quality of health services. In order to improve the provision of antenatal health care services, it is important to understand barriers to antenatal health care utilization. Therefore, it is advisable to collect information from women about the reasons for not receiving antenatal care.展开更多
Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the predict...Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the prediction of the number of crashes that would occur on a specific road segment or intersection in a time period, while crash severity models generally explore the relationship between crash severity injury and the contributing factors such as driver behavior, vehicle characteristics, roadway geometry, and road-environment conditions. Effective interventions to reduce crash toll include design of safer infrastructure and incorporation of road safety features into land-use and transportation planning;improvement of vehicle safety features;improvement of post-crash care for victims of road crashes;and improvement of driver behavior, such as setting and enforcing laws relating to key risk factors, and raising public awareness. Despite the great efforts that transportation agencies put into preventive measures, the annual number of traffic crashes has not yet significantly decreased. For in-stance, 35,092 traffic fatalities were recorded in the US in 2015, an increase of 7.2% as compared to the previous year. With such a trend, this paper presents an overview of road crash prediction models used by transportation agencies and researchers to gain a better understanding of the techniques used in predicting road accidents and the risk factors that contribute to crash occurrence.展开更多
In this paper,we discuss some important aspects of the bivariate alternative zero inflated log-arithmic series distribution(BAZILSD)of which the marginals are the alternative zero-inflated logarithmic series ditributi...In this paper,we discuss some important aspects of the bivariate alternative zero inflated log-arithmic series distribution(BAZILSD)of which the marginals are the alternative zero-inflated logarithmic series ditributions of Kumar and Riyaz(2015.An alternative version of zero-inflated logarithmic series distribution and some of its applications.Journal of Statistical Computation and Simulation,85(6),1117-1127).We study some important properties of the distribution by deriving expressions for its probability mass function,factorial moments,conditional probabil-ity generating functions,and recursion formulae for its probilities,raw moments and factorial moments.The parameters of the BAZILSD are estimated by the method of maximum likelihood and certain test procedures are also considered.Further certain real-life data applications are cited for ilustrating the usefulness of the model.A simulation study is conducted for assessing the performance of the maximum likelihood estimators of the parameters of the BAZILSD.展开更多
Count data with excess zeros encountered in many applications often exhibit extra variation. There- fore, zero-inflated Poisson (ZIP) model may fail to fit such data. In this paper, a zero-inflated double Poisson mo...Count data with excess zeros encountered in many applications often exhibit extra variation. There- fore, zero-inflated Poisson (ZIP) model may fail to fit such data. In this paper, a zero-inflated double Poisson model (ZIDP), which is generalization of the ZIP model, is studied and the score tests for the significance of dis- persion and zero-inflation in ZIDP model are developed. Meanwhile, this work also develops homogeneous tests for dispersion and/or zero-inflation parameter, and corresponding score test statistics are obtained. One numer- ical example is given to illustrate our methodology and the properties of score test statistics are investigated through Monte Carlo simulations.展开更多
Count data with excess zeros are often encountered in many medical, biomedical and public health applications. In this paper, an extension of zero-inflated Poisson mixed regression models is presented for dealing with...Count data with excess zeros are often encountered in many medical, biomedical and public health applications. In this paper, an extension of zero-inflated Poisson mixed regression models is presented for dealing with multilevel data set, referred as hierarchical mixture zero-inflated Poisson mixed regression models. A stochastic EM algorithm is developed for obtaining the ML estimates of interested parameters and a model comparison is also considered for comparing models with different latent classes through BIC criterion. An application to the analysis of count data from a Shanghai Adolescence Fitness Survey and a simulation study illustrate the usefulness and effectiveness of our methodologies.展开更多
We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Beside...We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.展开更多
There are two weaknesses in current researches into human casualty of ship collision.One is that the range of injuries or fatalities is restricted to the maximum number of casualties in a particular sample,which may n...There are two weaknesses in current researches into human casualty of ship collision.One is that the range of injuries or fatalities is restricted to the maximum number of casualties in a particular sample,which may not cover all the possible numbers of casualties in the future.International Maritime Organization(IMO)employed the injured or dead percentage of all the persons on board to represent casualties,but it only provided several discrete values to quantify human losses in different scenarios.The other is that the assumption that the distributions of the injuries or fatalities follow certain distribution,such as negative binomial and Poisson distributions is left to be statistically tested.Firstly,this study considers casualty rate,including injury and fatality rates,as random variables;the interval of the variables are from 0 to 1.Then,the distributions of the variables are investigated using historical data.From historical data,we can find that there are many zeros.Zeroinflated models are proved to be effective in processing data with inflated zeros.Furthermore,the probability density of the variables decreases rapidly as the casualty rate becomes larger.Thus,zero-inflated exponential distribution is assumed to fit the data.The parameters of zero-inflated exponential distribution are calibrated by maximum likelihood estimation(MLE)method.Finally,the assumption is tested by chi-square test.The zeroinflated exponential distribution can be used to generate human losses as a part of consequences in the simulation of ship collision risk.展开更多
基金support from the National Science,Research and Innovation Fund (NSRF)King Mongkut’s University of Technology North Bangkok (Grant No.KMUTNB-FF-65-22).
文摘A new three-parameter discrete distribution called the zero-inflated cosine geometric(ZICG)distribution is proposed for the first time herein.It can be used to analyze over-dispersed count data with excess zeros.The basic statistical properties of the new distribution,such as the moment generating function,mean,and variance are presented.Furthermore,confidence intervals are constructed by using the Wald,Bayesian,and highest posterior density(HPD)methods to estimate the true confidence intervals for the parameters of the ZICG distribution.Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020.The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.
基金supported by the "948" Project of the State Forestry Administration of China(No.2013-4-66)
文摘The mortality of trees across diameter class model is a useful tool for predicting changes in stand structure.Mortality data commonly contain a large fraction of zeros and general discrete models thus show more errors.Based on the traditional Poisson model and the negative binomial model,different forms of zero-inflated and hurdle models were applied to spruce-fir mixed forests data to simulate the number of dead trees.By comparing the residuals and Vuong test statistics,the zero-inflated negative binomial model performed best.A random effect was added to improve the model accuracy;however,the mixed-effects zero-inflated model did not show increased advantages.According to the model principle,the zeroinflated negative binomial model was the most suitable,indicating that the"0"events in this study,mainly from the sample"0",i.e.,the zero mortality data,are largely due to the limitations of the experimental design and sample selection.These results also show that the number of dead trees in the diameter class is positively correlated with the number of trees in that class and the mean stand diameter,and inversely related to class size,and slope and aspect of the site.
基金supported by the Basic Performance Key Project,the Ministry of Science and Technology of the People’s Republic of China(No.2006FY110300)
文摘Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.
文摘Many researchers have discussed zero-inflated univariate distributions. These univariate models are not suitable, for modeling events that involve different types of counts or defects. To model several types of defects, multivariate Poisson model is one of the appropriate model. This can further be modified to incorporate inflation at zero and we can have multivariate zero-inflated Poisson distribution. In the present article, we introduce a new Bivariate Zero Inflated Power Series Distribution and discuss inference related to the parameters involved in the model. We also discuss the inference related to Bivariate Zero Inflated Poisson Distribution. The model has been applied to a real life data. Extension to k-variate zero inflated power series distribution is also discussed.
文摘This paper discusses the estimation of parameters in the zero-inflated Poisson (ZIP) model by the method of moments. The method of moments estimators (MMEs) are analytically compared with the maximum likelihood estimators (MLEs). The results of a modest simulation study are presented.
文摘In a typical Kenyan HIV clinical setting, there is a likelihood of registering many zeros during the routine monthly data collection of new HIV infections among HIV exposed infants (HEI). This is attributed to the implementation of the prevention of mother to child transmission (PMTCT) policies. However, even though the PMTCT policy is implemented uniformly across all public health facilities, implementation naturally differs from every facility due to differential health systems and infrastructure. This leads to structured zero among reported positive HEI (where PMTCT implementation is optimum) and non-structured zero among reported positive HEI (where PMTCT implementation is not optimum). Hence the classical zero-inflated and hurdle models that do not account for the abundance of structured and non-structured zeros in the data can give misleading results. The purpose of this study is to systematically compare performance of the various zero-inflated models with an application to HIV Exposed Infants (HEI) in the context of structured and unstructured zeros. We revisit zero-inflated, hurdle models, Poisson and negative binomial count models and conduct the simulations by varying sample size and levels of abundance zeros. Results from simulation study and real data analysis of exposed infant diagnosis show the negative binomial emerging as the best performing model when fitting data with both structured and non-structured zeros under various settings.
文摘Zero-Inflated Poisson model has found a wide variety of applications in recent years in statistical analyses of count data, especially in count regression models. Zero-Inflated Poisson model is characterized in this paper through a linear differential equation satisfied by its probability generating function [1] [2].
文摘Zero-inflated distributions are common in statistical problems where there is interest in testing homogeneity of two or more independent groups. Often, the underlying distribution that has an inflated number of zero-valued observations is asymmetric, and its functional form may not be known or easily characterized. In this case, comparisons of the groups in terms of their respective percentiles may be appropriate as these estimates are nonparametric and more robust to outliers and other irregularities. The median test is often used to compare distributions with similar but asymmetric shapes but may be uninformative when there are excess zeros or dissimilar shapes. For zero-inflated distributions, it is useful to compare the distributions with respect to their proportion of zeros, coupled with the comparison of percentile profiles for the observed non-zero values. A simple chi-square test for simultaneous testing of these two components is proposed, applicable to both continuous and discrete data. Results of simulation studies are reported to summarize empirical power under several scenarios. We give recommendations for the minimum sample size which is necessary to achieve suitable test performance in specific examples.
文摘Zero-inflated negative binomial distribution is characterized in this paper through a linear differential equation satisfied by its probability generating function.
文摘Empirical estimates of power and Type I error can be misleading if a statistical test does not perform at the stated rejection level under the null hypothesis. We employed the permutation test to control the empirical type I errors for zero-inflated exponential distributions. The simulation results indicated that the permutation test can be used effectively to control the type I errors near the nominal level even the sample sizes are small based on four statistical tests. Our results attest to the permutation test being a valuable adjunct to the current statistical methods for comparing distributions with underlying zero-inflated data structures.
文摘A new two-parameter count distribution is derived starting with probabilistic arguments around the gamma function and the digamma function. This model is a generalization of the Poisson model with a noteworthy assortment of qualities. For example, the mean is the main model parameter;any possible non-trivial variance or zero probability can be attained by changing the other model parameter;and all distributions are visually natural-shaped. Thus, exact modeling to any degree of over/under-dispersion or zero-inflation/deflation is possible.
基金funded by Asia–Pacific Forests Net(APFNET/2010/FPF/001)National Natural Science Foundation of China(Grant No.31400552)
文摘The occurrence of lightning-induced forest fires during a time period is count data featuring over-dispersion (i.e., variance is larger than mean) and a high frequency of zero counts. In this study, we used six generalized linear models to examine the relationship between the occurrence of lightning-induced forest fires and meteorological factors in the Northern Daxing'an Mountains of China. The six models included Poisson, negative binomial (NB), zero- inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), Poisson hurdle (PH), and negative binomial hurdle (NBH) models. Goodness-of-fit was compared and tested among the six models using Akaike information criterion (AIC), sum of squared errors, likelihood ratio test, and Vuong test. The predictive performance of the models was assessed and compared using independent validation data by the data-splitting method. Based on the model AIC, the ZINB model best fitted the fire occurrence data, followed by (in order of smaller AIC) NBH, ZIP, NB, PH, and Poisson models. The ZINB model was also best for pre- dicting either zero counts or positive counts (〉1). The two Hurdle models (PH and NBH) were better than ZIP, Poisson, and NB models for predicting positive counts, but worse than these three models for predicting zero counts. Thus, the ZINB model was the first choice for modeling the occurrence of lightning-induced forest fires in this study, which implied that the excessive zero counts of lightning- induced fires came from both structure and sampling zeros.
文摘Objectives: This study empirically assesses the impact of the changes in women’s characteristics, empowerment, availability and quality of health services on woman’s decision to use antenatal care (ANC) and the frequency of that use during the period 2000-2008. Study Design: The study is a cross-sectional analytical study using 2000 and 2008 Egypt Demographic and Health Surveys. Methods: The assessment of the studied impact is conducted using the Zero-inflated Negative Binomial Regression. In addition, Factor Analysis technique is used to construct some of the explanatory variables such as women’s empowerment, the availability and quality of health services indicators. Results: Utilization of antenatal health care services is greatly improved from 2000 to 2008. Availability of health services is one of the main determinants that affect the number of antenatal care visits in 2008. Wealth index and quality of health services play an important role in raising the level of antenatal care utilization in 2000 and 2008. However, the impact of the terminated pregnancy on receiving ANC increased over time. Conclusions: Further research of the determinants of antenatal health care utilization is needed, using more updated measures of women’s empowerment, availability and quality of health services. In order to improve the provision of antenatal health care services, it is important to understand barriers to antenatal health care utilization. Therefore, it is advisable to collect information from women about the reasons for not receiving antenatal care.
文摘Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the prediction of the number of crashes that would occur on a specific road segment or intersection in a time period, while crash severity models generally explore the relationship between crash severity injury and the contributing factors such as driver behavior, vehicle characteristics, roadway geometry, and road-environment conditions. Effective interventions to reduce crash toll include design of safer infrastructure and incorporation of road safety features into land-use and transportation planning;improvement of vehicle safety features;improvement of post-crash care for victims of road crashes;and improvement of driver behavior, such as setting and enforcing laws relating to key risk factors, and raising public awareness. Despite the great efforts that transportation agencies put into preventive measures, the annual number of traffic crashes has not yet significantly decreased. For in-stance, 35,092 traffic fatalities were recorded in the US in 2015, an increase of 7.2% as compared to the previous year. With such a trend, this paper presents an overview of road crash prediction models used by transportation agencies and researchers to gain a better understanding of the techniques used in predicting road accidents and the risk factors that contribute to crash occurrence.
文摘In this paper,we discuss some important aspects of the bivariate alternative zero inflated log-arithmic series distribution(BAZILSD)of which the marginals are the alternative zero-inflated logarithmic series ditributions of Kumar and Riyaz(2015.An alternative version of zero-inflated logarithmic series distribution and some of its applications.Journal of Statistical Computation and Simulation,85(6),1117-1127).We study some important properties of the distribution by deriving expressions for its probability mass function,factorial moments,conditional probabil-ity generating functions,and recursion formulae for its probilities,raw moments and factorial moments.The parameters of the BAZILSD are estimated by the method of maximum likelihood and certain test procedures are also considered.Further certain real-life data applications are cited for ilustrating the usefulness of the model.A simulation study is conducted for assessing the performance of the maximum likelihood estimators of the parameters of the BAZILSD.
基金Supported in part by the National Natural Science Foundation of China under Grant No.11271193 and 11571073the Natural Science Foundation of Jiangsu Province under Grant No.BK20141326
文摘Count data with excess zeros encountered in many applications often exhibit extra variation. There- fore, zero-inflated Poisson (ZIP) model may fail to fit such data. In this paper, a zero-inflated double Poisson model (ZIDP), which is generalization of the ZIP model, is studied and the score tests for the significance of dis- persion and zero-inflation in ZIDP model are developed. Meanwhile, this work also develops homogeneous tests for dispersion and/or zero-inflation parameter, and corresponding score test statistics are obtained. One numer- ical example is given to illustrate our methodology and the properties of score test statistics are investigated through Monte Carlo simulations.
基金Supported by the National Natural Science Foundation of China(No.11171105 and No.11171293)National Social Science Foundation of China(No.10BTJ001)
文摘Count data with excess zeros are often encountered in many medical, biomedical and public health applications. In this paper, an extension of zero-inflated Poisson mixed regression models is presented for dealing with multilevel data set, referred as hierarchical mixture zero-inflated Poisson mixed regression models. A stochastic EM algorithm is developed for obtaining the ML estimates of interested parameters and a model comparison is also considered for comparing models with different latent classes through BIC criterion. An application to the analysis of count data from a Shanghai Adolescence Fitness Survey and a simulation study illustrate the usefulness and effectiveness of our methodologies.
基金The proposed COM-negative binomial distribution of this work was as early as conceptualized in December, 2014 when the authors saw the online version of [15]. The authors want to thank Prof. R. KShler for mailing the valuable encyclopedia of discrete univariate distributions [39] to them. This work was partly supported by the National Natural Science Foundation of China (Grant No. 11201165).
文摘We focus on the COM-type negative binomial distribution with three parameters, which belongs to COM-type (a, b, 0) class distributions and family of equilibrium distributions of arbitrary birth-death process. Besides, we show abundant distributional properties such as overdispersion and underdispersion, log-concavity, log-convexity (infinite divisibility), pseudo compound Poisson, stochastic ordering, and asymptotic approximation. Some characterizations including sum of equicorrelated geometrically distributed random variables, conditional distribution, limit distribution of COM-negative hypergeometric distribution, and Stein's identity are given for theoretical properties. COM- negative binomial distribution was applied to overdispersion and ultrahigh zeroinflated data sets. With the aid of ratio regression, we employ maximum likelihood method to estimate the parameters and the goodness-of-fit are evaluated by the discrete Kolmogorov-Smirnov test.
基金the Liberal Arts and Social Sciences Foundation of Ministry of Education in China(No.19YJCGJW003)
文摘There are two weaknesses in current researches into human casualty of ship collision.One is that the range of injuries or fatalities is restricted to the maximum number of casualties in a particular sample,which may not cover all the possible numbers of casualties in the future.International Maritime Organization(IMO)employed the injured or dead percentage of all the persons on board to represent casualties,but it only provided several discrete values to quantify human losses in different scenarios.The other is that the assumption that the distributions of the injuries or fatalities follow certain distribution,such as negative binomial and Poisson distributions is left to be statistically tested.Firstly,this study considers casualty rate,including injury and fatality rates,as random variables;the interval of the variables are from 0 to 1.Then,the distributions of the variables are investigated using historical data.From historical data,we can find that there are many zeros.Zeroinflated models are proved to be effective in processing data with inflated zeros.Furthermore,the probability density of the variables decreases rapidly as the casualty rate becomes larger.Thus,zero-inflated exponential distribution is assumed to fit the data.The parameters of zero-inflated exponential distribution are calibrated by maximum likelihood estimation(MLE)method.Finally,the assumption is tested by chi-square test.The zeroinflated exponential distribution can be used to generate human losses as a part of consequences in the simulation of ship collision risk.