摘要
【目的】以高维的结构化电子病历数据为研究对象,探究数据降维的策略,为电子病历知识发现提供参考。【方法】通过文献调研进行初步约简,再分别利用主成分分析法提取特征根大于1的因子、提取累计贡献率大于85%的因子,利用Logistic回归方法提取有显著差异性的因子进行降维;根据实证研究定性定量评价三种方法提取的属性结果。【结果】三种降维方法分别提取8个、17个和14个属性,经过定性和定量评价发现,利用主成分分析方法提取特征根大于1的因子的降维效果相对较好。【局限】数据样本量有限,未能搜集一定时间跨度的数据进行深入分析。【结论】本研究制定的数据降维策略有效,可以在保留目标数据原始特征的同时,对高维空间数据进行识别、定位、分析,用较少的属性特征代替整体数据集,解决电子病历数据维度过高带来的数据挖掘灾难,提高数据挖掘的效率和分析结果的准确性。
[Objective] This paper explores the strategy of reducing the data dimension of electronic medical records, aiming to improve the knowledge discovery. [Methods] First, we conducted preliminary dimension reduction through literature review. Then, we used three methods to finish the second round of dimension reduction. We extracted the factors with the eigenvalue greater than 1, with the cumulative contribution rate greater than 85%, as well as factors of significant differences. Finally, we compared results of the three methods with empirical research. [Results] The dimensional reduction methods extracted 8, 17 and 14 attributes respectively. After qualitative and quantitative evaluation, the principal component analysis method yielded the best result, whose dimension of the feature root was larger than 1. [Limitations] The sample size needs to be expanded for more in-depth analysis. [Conclusions] The proposed method could effectively reduce the data dimension of electronic medical records.
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2018年第1期88-98,共11页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金面上项目"嵌入式知识服务驱动下的领域多维知识库构建"(项目编号:71573102)
吉林省教育厅社会科学项目"虚拟健康社区知识发现与实证研究"(项目编号:JJKH20170881SK)的研究成果之一
关键词
降维
数据挖掘
知识发现
电子病历
Dimension Reduction
Data Mining
Knowledge Discovery
Electronic Medical Record