摘要
样本缺失是数据挖掘中常见的问题,一般将其看作完全随机缺失而忽略或采用等概率方式补齐。但有些样本缺失不是随机的,是选择行为的结果。本文将研究奖惩系统造成的非完全随机样本缺失问题,即保险实务中由于免赔额的引入造成的赔付数据缺失的问题。首先,结合保险背景,给出缺失数据模型新的解释;其次,给出对数变化和数据平移两方面的模型扩展,以便适应损失额一般为正值的特征和免赔额条款的影响;最后,基于一组货车的赔付数据,说明扩展后模型在车险定价中的可行性和实用性。本文首次将模型引入保险领域,并提出不可观测的索赔意向和存在非完全随机缺失的损失额的相关性分析;为改善模型实用性,对模型做对数变化和数据平移两方面的扩展;将数据零膨胀问题按成因分为两类:其分别由被保险人的索赔意向和保险公司免赔额条款两种因素引起,为后续保单基于免赔额条款定价做准备;证明了模型的一系列性质和定理;基于新建立的模型和一系列定理结果,结合车险数据,详细说明免赔额对赔付概率、均衡保费和弹性系数等的影响。
Sample missing is a common problem in data mining.It is generally random and ignored or supplemented in an equal probability manner.Some sample missing is not random,but is the result of the selection mechanism.In this paper,non⁃random sample missing due to the existence of the no claim discount system is studied.This is a research on the claim data missing issue due to the introduction of deductible in insurance practice.Firstly,a new explanation of the missing data model is given under the insurance background.Secondly,two model expansions including logarithmic variation and data translation are given in order to explain the characteristics that the loss is generally a positive random variable and influenced by the deductible.Finally,based on the compensation data of a group of trucks,it shows the feasibility and practicability of the extended models in vehicle insurance pricing.The main innovations in this paper are as follows:the models are introduced into the insurance field for the fist time,and the paper proposes the correlation coefficient between the unobservable claim intention and the loss amount with non⁃random miss⁃ing values;the paper extends the logrithmatic variation model and the data translation model in order to improve their practicabili⁃ty;divides the zero expansion of data into two different types caused by the insured′s claim intention and the insurance company′s application of deductibles respectively,making preparation for pricing on the basis of the deductible clause;the paper proves a series of properties and theorems of the extended models;and moreover,it offers a detailed explanation on the deductible′s impacts on the claim probability,level premiums and elastic coefficient on the basis of the newly established models and the related series of theorems.
出处
《保险研究》
CSSCI
北大核心
2023年第8期3-15,共13页
Insurance Studies
基金
西南财经大学科技创新项目“基于机器学习的保单分组与准备金评估”(JBK210301)
西南财经大学社科项目“机器学习框架下基于个人驾驶里程行为的新能源车险的定价研究”(JBK2203004)。
关键词
奖惩系统
样本缺失
零膨胀
相关性
the no claim discount system
sample missing
zero expansion
correlation