摘要
为了提高分布式处理海量电子病历进行分类的性能,提出了使用粗糙集理论分割海量电子病历。首先,对该研究过程中所采用的中文分词、特征选择、数据分割、分类并行化等算法进行研究。接着,分析了随机分割、属性值分割、信息熵分割电子病历的性能,提出了粗糙集理论分割海量电子病历的算法。实验结果表明:对于疾病分类,使用粗糙集理论分割方法构造的疾病分类器比随机分割、属性值分割、信息熵分割的性能更高,随着分割的块数增多,分类器仍旧相对稳定。
In order to improve the performance of the distributed processing of massive electronic medical records,the rough set theory is proposed for the segmentation of massive electronic medical records.Firstly,the Chinese word segmentation,feature selection,data segmentation and classification were studied in this research.Then,the performance of random segmentation,attribute value segmentation and information entropy segmentation are analyzed,and the rough set theory is proposed to segment massive electronic medical records in this paper.
出处
《工业控制计算机》
2017年第1期100-101,103,共3页
Industrial Control Computer