摘要
基于公开发表的中文病例报告文献构建医学诊疗实体语料库,搭建语料标注审核平台,以基于上下文语义理解的方式识别疾病、症状、检查、治疗4类医学诊疗实体。通过构建字符、词边界、上下文、词性和词典等特征,基于条件随机场模型提出一种多特征融合的中文病例报告诊疗命名实体识别方法,具有较高的识别准确率。
Published literatures of Chinese case reports is used to build the medical treatment entity corpus and the auditing platform of corpus tagging to identify 4 medical treatment entities of diseases,symptoms,examinations and treatments in line with the semantic understanding of the context.A multi-feature named entity recognition approach for Chinese case reports is put forward based on the conditional random fields model through establishing features of characters,word boundary,contexts,and dictionaries.This approach is of higher recognition accuracy.
作者
夏光辉
李军莲
邢宝坤
崔胜男
XIA Guanghui;LI Junlian;XING Baokun;CUI Shengnan(Institute of Medical Information,Chinese Academy of Medical Sciences,Beijng 100020,China;Peking Union Medical College Hospital,Chinese Academy of Medical Science&Peking Union Medical College,Beijing 100730,China)
出处
《医学信息学杂志》
CAS
2019年第6期54-59,共6页
Journal of Medical Informatics
基金
国家科技图书文献中心“下一代国家科技创新知识服务开放系统”先期研发任务课题“文本知识对象语义标注研究”(项目编号:XQYF0201)
中国医学科学院医学与健康科技创新工程重大协同创新项目“生物医学科技信息支撑平台”(项目编号:2016-I2M-2-005)
关键词
中文病例报告
医学诊疗
命名实体识别
条件随机场
Chinese case report
medical treatment
Named Entity Recognition(NER)
Conditional Random Field(CRF)