期刊文献+

基于条件随机场的中医临床病历命名实体抽取 被引量:31

Named Entity Extraction of Traditional Chinese Medicine Medical Records Based on Conditional Random Field
下载PDF
导出
摘要 中医临床病历是中医重要的科研数据资源,但目前临床病历仍以文本为主要表达形式,对病历数据深入分析的前提是进行结构化信息抽取,而命名实体抽取是其基础性步骤。针对中医临床病历的命名实体,如症状、疾病和诱因等的抽取问题,通过手工标注的413份病历数据(以中文字为特征)与4类特征模版,将条件随机场(CRF)、隐马尔科夫模型(HMM)和最大熵马尔科夫模型(MEMM)用于中医病历命名实体抽取的实验,并进行比较分析。结果表明,结合合适的特征模版,CRF命名实体抽取方法取得了较好的性能,F1值的症状达到0.80,疾病名称达到0.74,诱因0.74。与HMM和MEMM相比,CRF有最高的准确率和召回率,是一种较为适用的中医临床病历命名实体抽取方法。 Traditional Chinese Medicine(TCM)medical records are the important data resources of the TCM medical research. The main form of them is still text now,and it is necessary to extract the structured information from the medical records,while named entity extraction is the basic step. It makes413 copies of manually labeled medical records in Chinese text and four types of feature templates to study about the named entity extraction practice such as symptoms,diseases and incentives. It compares the results of TCM medical records named entity extraction by Conditional Random Field(CRF),Hidden Markov Model(HMM)and Maximum Entropy Markov Model(MEMM). Combined with appropriate feature templates,CRF has well performance of F1:symptoms0.80,the name of the disease0.74,incentives0.74. Compared with HMM and MEMM,CRF has the highest precision and recall rate. This preliminary shows that CRF is an applicable method of the Chinese medical records named entity extraction.
出处 《计算机工程》 CAS CSCD 2014年第9期312-316,共5页 Computer Engineering
基金 国家自然科学基金资助项目(61105055 81230086) 国家"863"计划基金资助项目(2012AA02A609) 中央高校基本科研业务费专项基金资助项目(K13JB00140)
关键词 中医临床病历 命名实体抽取 语料库标注系统 条件随机场 特征模板 Traditional Chinese Medicine(TCM)medical records named entity extraction corpus annotation system Conditional Random Field(CRF) feature template
  • 相关文献

参考文献10

  • 1Zhou Xuezhong, Peng Yonghong, Liu Baoyan. Text Mining for Traditional Chinese Medical Knowledge Discovery: A Survey [J]. Journal of Biomedical Informatics,2010,43(4):650-660.
  • 2Zhou Xuezhong, Liu Baoyan, Wang Yinghui, et al. Building Clinical Data Warehouse for Traditional Chinese Medicine Knowledge Discovery [C]/ / Proc. of International Conference on BioMedical Engineering and Informatics. [S. l.]:IEEE Press,2008:615-620.
  • 3Zhou Xuezhong, Chen Shibo, Liu Baoyan, et al. Development of Traditional Chinese Medicine Clinical Data Warehouse for Medical Knowledge Discovery and Decision Support[J]. Artificial Intelligence in Medicine, 2010,48(2/ 3):139-152.
  • 4Lafferty J D,McCallum A,Pereira F C N. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data [C]/ / Proc. of the 18th International Conference on Machine Learning. [S. l.]: Morgan Kaufmann Publishers Inc. ,2001:282-289.
  • 5熊 英. 中文自然语言理解中基于条件随机场理论的词法分析研究[D]. 上海:上海交通大学,2009.
  • 6Franzén K,Eriksson G,Olsson F,et al. Protein Names and How to Find Them [J]. International Journal of medical Informatics,2002,67(1):49-61.
  • 7Kim J D,Ohta T,Tsuruoka Y,et al. Introduction to the Bio-Entity Recognition Task at JNLPBA[C]/ / Proc. of International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. [S. l.]: IEEE Press,2004:70-75.
  • 8Freitag D, McCallum A. Information Extraction with HMM Structures Learned by Stochastic Optimization [C]/ / Proc. of the National Conference on Artificial Intelligence. [S. l.]:AAAI Press,2000:584-589.
  • 9McCallum A,Freitag D, Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation [C]/ / Proc. of the 17th International Conference on Machine Learning. Pittsburgh,USA:[s. n.],2000: 591-598.
  • 10McCallum A K. Mallet: A Machine Learning for Language Toolkit [EB / OL]. ( 2002-02-28 ). http:/ / mallet. cs. umass. edu.

同被引文献275

引证文献31

二级引证文献213

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部