摘要
网上医疗诊断越来越受欢迎,电子病例的数据也越来越多。如何从众多的医疗数据中降低医疗数据的冗余度,快速提取有用的医疗价值,提高医疗诊断的速度和精度,成了一个大家研究的热点问题。针对这一系列问题,研究了医疗系统关于肺癌诊断的一些数据,建立了基于属性依赖改进的可分辨矩阵属性约简的C4.5算法,并用随机森林进行算法改进。属性约简算法降低了医疗数据的冗余度,决策树算法提取了肺癌诊断的一些规则,随机森林提高了医疗诊断的准确性。文中对肺癌诊断场景进行了仿真实验与应用,并将单纯的C4.5算法,属性约简与单棵C4.5决策树,属性约简和C4.5决策树随机森林进行性能比较。实验结果表明,该方法加快了计算速度,提高了医疗诊断的精度。
Online medical diagnosis is becoming more and more popular, so more and more data are in electronic records. How to reduce the redundancy of medical data,extract useful medical value rapidly from a large number of medical data,and improve the speed and ac- curacy of medical diagnosis has become a hot issue. In view of it, some data of diagnosis of lung cancer in medical system are researched, and the C4.5 algorithm of attribute reduction based on attribute-dependent improved discernibility matrix is established and improved by stochastic forest. Attribute reduction algorithm reduces the redundancy of medical data,the decision tree algorithm extracts some rules of lung cancer diagnosis,and the stochastic forest raises the accuracy of diagnosis. In this paper, simulation and application are carried out under the scenario of lung cancer diagnosis. The simple C4.5 algorithm is made a comparison with the attribute reduction and the single C4.5 decision tree, and attribute reduction and random forests of C4.5 decision tree. The experiment shows that the proposed method ac- celerates the computing and improves the accuracy of medical diagnosis.
出处
《计算机技术与发展》
2017年第12期148-152,共5页
Computer Technology and Development
基金
国家级大学生创新项目(201510290002)
关键词
粗糙集
属性约简
可分辨矩阵
CA.5算法
决策树
rough set
attribute reduction
discernibility matrix
C4.5 algorithm
decision tree