摘要
计数数据大量出现在医学、社会学、心理学、保险和交通等领域,是一类十分重要的数据类型。不过,计数数据常出现过度分散现象,使得普通的泊松回归模型无法解释,从而失去效用。本文研究一类混合泊松分布,专门用于拟合这种过度分散的计数数据。主要工作是基于现有的泊松-广义逆高斯分布、泊松-倒逆高斯分布和泊松-逆伽玛分布等三类混合泊松分布,利用广义可加模型(GAMLSS)的灵活性,构建泊松-逆伽玛分布假设下的GAMLSS模型。为验证模型性能,本文还将泊松-逆伽玛、泊松-倒逆高斯和负二项分布假设下的GAMLSS模型应用于车险索赔频率数据,并根据全局偏差、AIC和BIC等准则评估模型。结果表明,本文模型对过度分散的车险索赔频率数据的拟合明显优于负二项、泊松-倒逆高斯分布假设下的GAMLSS模型,是一个处理过度分散计数数据的有效模型。
Count data is a very important data type,which appears in many fields such as medicine,sociology,psychology,insurance,transportation and so on.However,the count data is often over-dispersion,which makes the ordinary Poisson regression model unexplained.In this paper,we introduce a type of mixed Poisson distributions to fit over-dispersed count data.Based on the existing Poisson generalized inverse Gaussian(PGIG)distribution,the Poisson-reciprocal inverse Gaussian(PRIG)distribution and the Poisson-inverse Gamma(PIGA)distribution,we use the flexibility of the GAMLSS(generalized additive models for location,scale and shape)model and construct a GAMLSS model under the assumption that the response variables follow PIGA distribution.In order to verify the performance of our model,the GAMLSS models under PIGA,PRIG,and NB distribution assumptions are applied to the vehicle insurance claim frequency data,and the models are evaluated according to global deviation,AIC and BIC.The results show that our model can fit the over-dispersed vehicle insurance claim frequency data better than the GAMLSS models under the assumptions of PRIG and NB distributions,and is an effective model for dealing with over-dispersed count data.
作者
徐娇
马江洪
XU Jiao;MA Jiang-hong(School of Sciences,Chang'an University,Xian 710064,China)
出处
《数理统计与管理》
CSSCI
北大核心
2024年第3期423-436,共14页
Journal of Applied Statistics and Management
基金
国家重点研发计划项目(2023YFF1304703)。
关键词
混合泊松分布
过度分散
泊松-逆伽玛分布
GAMLSS模型
车险索赔频率
mixed Poisson distribution
over-dispersed
Poisson-inverse Gamma distribution
GAMLSS model
vehicle insurance claim frequency data