期刊文献+

分类模型对比研究——基于心脏病患者数据实证分析 被引量:1

Analysis On Application of Support Vector Machine in Heart Disease Data
下载PDF
导出
摘要 本文以实际医疗数据为应用对象,运用logistic、支持向量机、随机森林分类模型进行试验,对原始数据进行分析并做出预测。运用logistic、随机森林找到对患心脏病影响较大的因素,如家族性因素和累计吸烟量,进而提出有针对性的建议,并采用交叉验证法寻找支持向量机算法的最佳核函数和惩罚系数,得到最优分类模型。后对三个模型的分类效果进行比较,logistic回归模型的预测正确率为77.38%,模型结果可解释性强;支持向量机和随机森林模型的预测正确率为78.43%和79.21%。结果显示:非线性模型分类效果优于线性模型。支持向量机、随机森林模型计算简单、运行效率高,对高维大数据学习、预测能力强,训练时间短,且随机森林模型更兼顾了可解释性,克服了模型过拟合的问题,在心脏病等医疗诊断中有很大的应用潜力。 This paper collects the real medical data as object, and uses the logistic regression, support vector ma- chine and random forest classification model to classify data. These data is analyzed and predicted, and logistic regres- sion is used to find factors that has a greater impact on heart disease, such as familial factors and smoking, and then makes recommendations. The optimal kernel function and penalty coefficient of support vector machine are found by cross validation method, and the optimal classification model is obtained. After comparing the classification results of the three models,the prediction accuracy of logistic regression model is 77.38%, and the model results are interpretable. The prediction accuracy rate of support vector machine and random forest model is 78.43% and 79.21%. Due to the insufficient amount of data, the logistic regression model predicts the correctness is slightly lower than the support vector machine and random forest algorithm, but the support vector machine model, random forest model still have the advanta- ges of simple calculation, high operation efficiency, high learning data, high forecasting ability and short training time. Moreover, random forest model takes into account the interpretability, it has great application potential in the heart dis- ease and other medical diagnosis.
作者 张冰洁 Zhang Bingjie(School of Statistics and Mathematics,Zhongnan University of Economics and Law,Wuhan 430073,China)
出处 《中南财经政法大学研究生学报》 2017年第6期18-26,共9页 Journal of the Postgraduate of Zhongnan University of Economics and Law
关键词 支持向量机 LOGISTIC回归 随机森林 心脏病诊断 Support Vector Machine Logistic Regression Random Forest Heart Disease Diagnosis
  • 相关文献

参考文献5

二级参考文献36

  • 1方匡南,吴见彬,朱建平,谢邦昌.信贷信息不对称下的信用卡信用风险研究[J].经济研究,2010,45(S1):97-107. 被引量:64
  • 2刘霞,卢苇.SVM在文本分类中的应用研究[J].计算机教育,2007(01X):72-74. 被引量:7
  • 3肖志光.论我国保险市场区域均衡发展——基于保险需求的理论与实证[J].金融研究,2007(06A):181-191. 被引量:52
  • 4[2]Vapnik V.The nature of statistical learning theory.New York:Springer-Verlag,1995
  • 5郭志刚.社会统计分析方法[M].北京:中国人民大学出版社,1999..
  • 6伍德里奇.计量经济学导论[M].费剑平,译.北京:中国人大出版社,2003.
  • 7鲍威斯,丹尼尔、谢宇.2009.分类数据分析的统计方法[M].任强,等,译.北京:社会科 学文献出版社.
  • 8(美)韩家炜(Han,j.)等著.数据挖掘:概念与技术[M].范明等译.北京:机械工业出版社,2012.
  • 9贝里,威廉·D.2012.非递归因果模型[M].洪岩璧、陈陈,译.上海:格致出版社.
  • 10中国人民大学中国调查与数据中心中国综合社会调查项目.2009.中国综合社会调查报告(2003-2008)[M].北京:中国社会出版社.

同被引文献4

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部