摘要
目的探讨决策树技术在农村儿童贫血研究中的应用。方法在SAS8.2软件的Enterprise Miner模块中,将3000例农村地区3岁以下断奶儿童的卫生保健研究数据按75%和25%分为初步拟合模型的训练集与调整模型的验证集,利用Gini杂质函数建立CART算法决策树模型,以误分率、ROC曲线、Root ASE和诊断图建立的模型进行评价。通过模型中的变量以及变量在模型中的上下层级关系,来分析农村地区3岁以下断奶儿童贫血发生的影响因素,以及影响因素间的相互作用。结果CART决策树模型中训练集和验证集的误分率分别为21.2%、21.9%,RootASE为0.399、0.404;模型的ROC曲线高于参考线,有较大的曲线下面积;诊断图中实际值和预测值相一致的比例最大,正确分类的观察符合率明显高于错误分类的观察符合率;决策树模型共筛选出9个影响儿童贫血的重要因素,并按影响因素间的相对重要性进行了排序,其中母亲是否贫血(1.00)是最重要的影响因素,其他的是儿童的月龄(0.75)、儿童的断奶时间(0.53)、孩子母亲的年龄(0.32)、添加鸡蛋的时间(0.26)、项目县分类(0.26)、添加鲜奶的时间(0.16)、家庭人口数(0.13)和母亲受教育年限(0.12)。结论决策树技术为有效分析儿童保健研究方面的资料提供一种新的思路。
Objective To study the application of decision tree in the research of anemia among rural children. Methods In the Enterprise Miner module of software SAS 8. 2,3000 observations were sampled from database and the decision tree model was built. The model using decision tree of CART bases on Gini impurity index. Results The misclassification rate of decision tree model was, training set 21.2% , validation set 21.9%. The Root ASE of decision tree model was, training set 0. 399, validation set 0. 404. The area under the ROC curve was larger than the reference line. The diagnostic chart showed that the corresponding percentage was higher than the other. The decision tree model selected 9 important factors and ranked them by their power, among which mother of anemia ( 1.00 ) was the most important factor. Others were children's age (0.75), time of ablactation(0. 53 ), mother's age( 0. 32 ), the time of egg supplementation (0. 26), category of the project county(0.26), the time of milk supplementation (0. 16), number of people in the family (0. 13) ,the education status of the mother (0. 12). Decision tree produced simple and easy rules that might be used to classify and predict in the same research. Conclusion Decision tree could screen out the important factors of anemia and identify the cutting-points for factors. With the wide application of decision tree, it would exhibit important application values in the research of the rural children health care.
出处
《中华预防医学杂志》
CAS
CSCD
北大核心
2009年第5期434-437,共4页
Chinese Journal of Preventive Medicine
基金
卫生部与联合国儿童基金会资助项目(YH001)
国家自然科学基金(30771866)
关键词
决策树
贫血
儿童
误分率
Decision tree
Anemia
Child
Misclassification rate