期刊文献+

利用实体与依存句法结构特征的病历短文本分类方法 被引量:2

Short Text Classification of EMR Based on Entities and Dependency Parser
下载PDF
导出
摘要 近年来,电子病历文本的分类、挖掘成为医学大数据研究的基础。该文提出一种利用实体与依存句法结构分析构特征集的电子病历短文本分类方法。首先对病历文本进行自然语言处理,包括分句、分词、词性标注以及实体提取,构建实体词典,利用TF-IDF方法构建词-文本矩阵并利用潜在语义分析LSA方法进行词汇特征的选择,然后分析病历文本的依存句法关系,挖掘出词汇之间的依存关系并构建特征三元组作为分类特征的扩展,最后构建出分类特征向量集对病历短文本进行分类。实验证明,相比于未进行特征扩展的短文本分类,所提方法能有效地提高分类器的分类性能,其分类的准确率与F值均有明显的提高。 Nowadays, text classification and text mining of Electronic Medical Record(EMR) have become the basis of the Big Data research in biomedical fields. This paper proposes a method using entity dictionaries and dependency parser as the feature to do the classification of short texts in EMR. It used NLP to preprocess the texts first including sentence segmentation, word segmentation, part of speech and entity extraction. Then several entity dictionaries were built according to the result of NLP. After that the TF-IDF and LSA were deployed to select the vocabulary feature. Then considering the characters of EMR, dependency parser was done to the texts and triple dependency relation features would be used as the expanding feature for text classification. The result of the experiment shows that comparing to the classification which used vocabulary features only, the proposed method can effectively improve the performance of classifier and the precision and F-value are obviously higher.
出处 《中国医疗器械杂志》 2016年第4期245-249,共5页 Chinese Journal of Medical Instrumentation
关键词 电子病历 短文本 TF-IDF LSA 依存句法结构分析 特征三元组 EMR short texts TF-IDF LSA dependency parser triple dependency relation feature
  • 相关文献

参考文献13

二级参考文献102

共引文献333

同被引文献15

引证文献2

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部