不同缺失数据处理方法对D-vine Copula分类器的影响

Influence of different missing data techniques on D-vine copula classifier

下载PDF

导出

摘要数据缺失是较为常见的影响数据质量的因素,会降低分析结果的可靠性。采用不同方法填补缺失数据,再用D-vine copula分类器对填补后的数据做分类,通过预测准确率来分析不同缺失数据处理方法对D-vine copula分类器的影响。首先,介绍了5种常用的缺失数据处理方法和D-vine copula分类器的相关知识;其次,结合实际数据,模拟不同的缺失比例,用这5种方法对数据进行填补;最后,用D-vine copula分类器对填补后的数据做分类,对分类准确率进行比较分析。研究发现,填补后的数据在D-vine copula分类器上表现得较为稳定,当数据缺失比例在5%~10%时,用随机插补法处理缺失数据效果较好,当数据缺失比例较大时,可以优先考虑用K最近邻插补法处理缺失数据。 Missing data is a frequent factor that influences of the quality of the data,which will decrease the reliability of analysis results.In this paper,missing data is filled in by different methods and classified by the D-vine classifier.Accuracy are used to analyze the influence of different missing data processing methods on the D-vine copula classifier.Firstly,five common imputation methods and the knowledge of D-vine copula classifier are introduced.Secondly,the practical data with different missing ratios are filled with common methods.Finally,the D-vine copula classifier is used to classify the new data,and the accuracy are compared and analyzed.The study found that for the D-vine copula classifier,the data by the methods dealing with incomplete data are more steady.When the absence rate is 5%~10%,the random imputation is more effective.When the proportion of missing data is too high,prior consideration of K nearest neighbor imputation is better than others.

作者杨光王蕾付志慧 YANG Guang;WANG Lei;FU Zhihui(College of Mathematics and Systems Science, Shenyang Normal University, Shenyang 110034, China;College of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, China)

机构地区沈阳师范大学数学与系统科学学院闽南师范大学数学与统计学院

出处《沈阳师范大学学报（自然科学版）》 CAS 2021年第1期35-38,共4页 Journal of Shenyang Normal University:Natural Science Edition

基金辽宁省教育厅科学研究经费项目(LJC201914) 辽宁省科技厅自然科学基金资助项目(2019MS285)。

关键词缺失数据 D-vine Copula 分类器 K最近邻插补法 missing data D-vine copula classifier K nearest neighbor imputation

分类号 TP274 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献3

1金勇进.缺失数据的插补调整[J].数理统计与管理,2001,20(6):47-53. 被引量：68
2孟杰,李春林.基于随机森林模型的分类数据缺失值插补[J].统计与信息论坛,2014,29(9):86-90. 被引量：27
3张尧庭.连接函数(copula)技术与金融风险分析[J].统计研究,2002,19(4):48-51. 被引量：295

二级参考文献8

1[1]Nelsen, R. B (1998), An Introduction to Copulas, Lectures Notes in Statistics, 139,Springer Verlag, New York.
2[2]Embrechts, P., Lindskog, F. And McNeil, A. (2001), Modelling Dependence with Copulas and Applications to Risk Management. Dept. of Math. CH-8092, Zürich, Switzerland.
3[3]Bouyé, E. (2000), Copulas for Finance, A Reading Guide and Some Applications. City University Business School,London.
4李春林,申博.数据挖掘在河北省农村居民总体满意度调查中的应用[J].科技情报开发与经济,2012,22(7):94-97. 被引量：1
5杨贵军,蔡娟,赵晓云.高相关性辅助变量择优回归插补法[J].统计与信息论坛,2012,27(6):8-13. 被引量：6
6方匡南,吴见彬,朱建平,谢邦昌.随机森林方法研究综述[J].统计与信息论坛,2011,26(3):32-38. 被引量：681
7金勇进,朱琳.不同差补方法的比较[J].数理统计与管理,2000,19(4):50-54. 被引量：20
8方匡南,吴见彬.个人住房贷款违约预测与利率政策模拟[J].统计研究,2013,30(10):54-60. 被引量：19

共引文献384

1陈文生,侯成琪.黄金具有对冲和避险属性吗——基于异方差识别法和copula模型的实证分析[J].金融学季刊,2020(4):60-90.
2何步文,王世哲.甘肃省经济社会发展与国民体质关系研究[J].兰州学刊,2010(2):211-214. 被引量：3
3孙明明,程希骏.一种选择最优copula的新方法[J].中国科学技术大学学报,2010,40(9):887-891. 被引量：2
4董彬彬.中国沪深股市相关性研究[J].市场周刊,2010,23(1):61-63. 被引量：2
5李彦恒,史保平,张健.Copula joint function and its application in probability seismic hazard analysis[J].Acta Seismologica Sinica(English Edition),2008,21(3):296-305.
6吴振翔,叶五一,缪柏其.基于Copula的外汇投资组合风险分析[J].中国管理科学,2004,12(4):1-5. 被引量：50
7韩明.Copula——一个新的计量经济工具[J].统计与信息论坛,2004,19(5):93-95. 被引量：2
8韦艳华,张世英,郭焱.金融市场相关程度与相关模式的研究[J].系统工程学报,2004,19(4):355-362. 被引量：83
9刘国光,许世刚.基于Copula方法深圳A股、B股投资组合风险值实证分析[J].淮海工学院学报（自然科学版）,2004,13(4):82-84. 被引量：3
10刘国光,许世刚.投资组合管理中连接函数应用[J].集美大学学报（哲学社会科学版）,2004,7(4):68-71. 被引量：1

1刘小勇,秦昕洲,李荣丽,孔清屿,高嵩,李奇涵,郭琦,吴小玲.数字图像相关方法中基于熵的散斑子图质量评价研究[J].机床与液压,2021,49(7):48-51. 被引量：1
2邹玉叶,范国良.小波估计方法发展综述[J].应用概率统计,2021,37(2):201-220.
3田密,盛小涛.有限数据条件下土性参数波动范围计算方法的有效性分析[J].土木工程与管理学报,2021,38(2):112-118.

沈阳师范大学学报（自然科学版）

2021年第1期

浏览历史

内容加载中请稍等...

不同缺失数据处理方法对D-vine Copula分类器的影响

参考文献3

二级参考文献8

共引文献384

相关作者

相关机构

相关主题

浏览历史