摘要
数据缺失是较为常见的影响数据质量的因素,会降低分析结果的可靠性。采用不同方法填补缺失数据,再用D-vine copula分类器对填补后的数据做分类,通过预测准确率来分析不同缺失数据处理方法对D-vine copula分类器的影响。首先,介绍了5种常用的缺失数据处理方法和D-vine copula分类器的相关知识;其次,结合实际数据,模拟不同的缺失比例,用这5种方法对数据进行填补;最后,用D-vine copula分类器对填补后的数据做分类,对分类准确率进行比较分析。研究发现,填补后的数据在D-vine copula分类器上表现得较为稳定,当数据缺失比例在5%~10%时,用随机插补法处理缺失数据效果较好,当数据缺失比例较大时,可以优先考虑用K最近邻插补法处理缺失数据。
Missing data is a frequent factor that influences of the quality of the data,which will decrease the reliability of analysis results.In this paper,missing data is filled in by different methods and classified by the D-vine classifier.Accuracy are used to analyze the influence of different missing data processing methods on the D-vine copula classifier.Firstly,five common imputation methods and the knowledge of D-vine copula classifier are introduced.Secondly,the practical data with different missing ratios are filled with common methods.Finally,the D-vine copula classifier is used to classify the new data,and the accuracy are compared and analyzed.The study found that for the D-vine copula classifier,the data by the methods dealing with incomplete data are more steady.When the absence rate is 5%~10%,the random imputation is more effective.When the proportion of missing data is too high,prior consideration of K nearest neighbor imputation is better than others.
作者
杨光
王蕾
付志慧
YANG Guang;WANG Lei;FU Zhihui(College of Mathematics and Systems Science, Shenyang Normal University, Shenyang 110034, China;College of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, China)
出处
《沈阳师范大学学报(自然科学版)》
CAS
2021年第1期35-38,共4页
Journal of Shenyang Normal University:Natural Science Edition
基金
辽宁省教育厅科学研究经费项目(LJC201914)
辽宁省科技厅自然科学基金资助项目(2019MS285)。