摘要
通过构造不同缺失水平条件下的印度肝脏病人多重缺失数据,运用多重插补和随机森林两个插补方法补充完整数据集,并将数据集应用到支持向量机分类器当中,证明了同等条件下随机森林方法插补下的支持向量机分类器分类效果相对总优于多重插补方法下的分类效果.应用插补方法灵活地将缺失数据应用在机器学习方法当中具有很好的前景.
Multiple deletions data were constructed for Indian liver patients at different levels of deletion.Two interpolation methods,multiple interpolation and random forest were used to supplement the complete data set.The data set is applied to support vector machine(SVM)classifiers,and it is proved that the interpolation effect of the random forest method under the same conditions is better than the multiple interpolation results of SVM classification.The application of interpolation to flexibly apply missing data in machine learning has a good prospect.
作者
王纯杰
张乐
陈嘉
王淑影
WANG Chun-jie;ZHANG Le;CHEN Jia;WANG Shu-ying(School of Mathematics and Statistics,Changchun University of Technology,Changchun 130012,China)
出处
《吉林师范大学学报(自然科学版)》
2020年第4期36-40,共5页
Journal of Jilin Normal University:Natural Science Edition
基金
国家自然科学基金项目(11671054)
长春市科技创新“双十工程”项目(18SS013)。
关键词
多重插补
随机森林插补
支持向量机
印度肝脏数据
multiple interpolation
random forest interpolation
support vector machine
Indian liver data