摘要
目的探讨3种属性约简方法在中医证候数据约简中的比较应用。方法分别采用相关性分析、主成分分析、基于粗糙集的属性约简方法对同一个原发性失眠症中医证候数据集进行约简,并构建原发性失眠肝郁化火证的C4.5决策树分类模型,用5-交叉验证法进行模型评估。结果基于粗糙集约简模型各项指标均优于其他两种约简模型,受试者工作特征曲线(ROC曲线)下面积与其他两种模型比较差异均有统计学意义(P<0.01)。相关性约简模型与主成分约简模型ROC曲线下面积差异无统计学意义(P>0.05)。结论基于粗糙集的属性约简方法能在保持较高质量分类能力的基础上,尽可能消除决策表中不必要的知识,是中医证候数据约简的可行性方法。
Objective To discuss the application of three attribute reduction methods in data reduction of syndrome in TCM.Methods Bivariate correlation analysis,principal component analysis and rough set were respectively performed for attribute reduction on the same TCM syndrome data set of primary insomnia.A C 4.5 decision tree classification models of pathogenic fire derived from stagnation of liver-QI of primary insomnia was established,evaluated by 5-fold cross-validation.Results Every index of rough set reduction model was better than the other reduction models.Its area under the ROC curve was larger than the other two models with statistic significance(P〈0.05).There was no significant different between correlation reduction model and principal component reduction model(P〉0.05).Conclusions The model built by attribute reduction method based on rough set could maintain a high capability of classification.The reduction could eliminate unnecessary knowledge from the information system(Decision Tables) as far as possible,result in a small subset with well ability of classification.And it is a feasible reduction method in TCM syndrome data processing.
出处
《中医杂志》
CSCD
北大核心
2012年第4期321-323,330,共4页
Journal of Traditional Chinese Medicine
基金
广东省建设中医药强省课题资助项目(2010134)
关键词
数据挖掘
属性约简
证候
失眠症
data mining
attribute reduction
TCM syndrome