摘要
结合HL7(Health Level Seven)标准的数据存储特点对目前电子病历的内容和结构进行了深入分析,提出了医疗信息五元组模式,以及更为细化的二元组和语义类描述,并在此基础上提出了模式泛化、模式获取、医疗信息自动抽取等一系列算法.通过实际312份住院病历数据下的实验表明,系统在查准率与查全率方面,获得了较好的结果,而且由于有自动学习的特性,随着训练语料的增加,系统的整体性能表现将更加优异.
We analyzed the contents and structure of current electronics medical records, and proposed a definition of Five-Tuples pattern and another more fine-grained definition of two-turples pattern and semantic clas- ses. On this foundation, we proposed a series of algorithms including patterns generalization, patterns automatic extraction and medical information extraction. The experiments with 312 actual medical records show that the system performs well both in the precision and recall. And because of the functionality of self-learning, the system will be more outstanding with an increase in the training corpus.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2011年第11期89-94,共6页
Journal of Harbin Institute of Technology
基金
国家自然科学基金资助项目(60803092)
山东省优秀中青年科学家奖励基金资助项目(2010BSA10014)
山东省科技攻关资助项目(2009GG10002053)
关键词
电子病历
信息抽取
HL7
模式自动抽取
electronic medical record
information extraction
HL7
automatic pattern discovery