摘要
针对传统数据库统计方式难以实现学生体测数据内在关系的挖掘和横向比对,提出一种改进的C4.5决策树算法,通过改进的信息熵和简化的函数关系来计算分裂信息度量,在保证预测结果精度的前提下,去除连续属性数据中非必要划分点的信息增益率的计算量,提高算法的运行效率。基于MySQL数据库的在校大学生体测平台,将决策树算法内嵌于平台来执行数据的挖掘和处理,实现对大学生体测数据的规划化处理和内在关系的挖掘。实例验证结果表明,肺活量测试项目是影响大学生身体健康的最大因素,其中身高和体重成为造成肺活量测试失败的最大影响因素。
According to the data and information accumulated by the student physical measurement,the traditional database statistical method is difficult to realize the data mining and horizontal comparison work of the internal relationship between the test data. An improved C4.5 decision tree algorithm is proposed,using improved information entropy and simplified function relations to calculate the split information measure,by removing the information gain rate calculation of unnecessary partition points in the continuous attribute data,and improving the operation efficiency of the algorithm while ensuring the accuracy of the prediction results. By establishing a physical measurement platform based on MySQL database,the decision tree algorithm is embedded in the platform to perform data mining and processing,and realizing the planning processing and internal relationship mining of college students’ body measurement data. The case validation results show that the lung capacity test program is the biggest factor affecting the health of college students,and the height and weight are the biggest influencing factors causing the failure of the lung capacity test.
作者
张雪琴
江帆
席本玉
ZHANG Xueqin;JIANG Fan;XI Benyu(Xi’an Jiaotong University City College,Xi’an 710018,China)
出处
《电子设计工程》
2022年第13期87-90,95,共5页
Electronic Design Engineering