摘要
由于上海市区域医疗健康平台整合了38家三级医院的电子病历,各医院表述同一临床检验指标的多样性和歧义性已严重影响病历挖掘研究。然而现有术语库理论性强,难以覆盖实际临床用语,需要构建融合38家医院的临床检验指标术语库。针对该问题,在模式图定义、知识抽取、知识融合和知识校验4个步骤基础上,提出半自动的术语库构建方案,以上海卫健委制定的医保术语为标准,先构建标准指标术语子库,再利用基于BERT的临床检验指标对齐模型,将38家医院的指标作为同义词归入标准术语。最终形成的指标术语库包含23495个实体和47746条事实三元组,可用于病历清洗、病历查询等应用。实验表明,所用指标对齐模型的F1-score可达95.78%,在大肠癌挖掘课题中使用术语库可增加查询记录高达94%。此外,大肠癌相关指标的专病术语库已在dcakb.ecustnlplab.com公开。
On Shanghai Regional Health Platform with electronic medical record data of 38 tertiary hospitals,the diversity and ambiguity of clinic indicators have seriously affected medical data mining.In this paper,we propose a semi-automatic terminology base construction solution based on the following four steps:schema design,information extraction,knowledge fusion and knowledge verification.We first build a standard indicator sub-base according to the medical insurance standard provided by Shanghai Municipal Health Commission.Then we use BERT-based clinical indicator alignment model to integrate indicators in 38 hospitals as synonyms into the standard.The constructed terminology base contains 23,495 entities and 47,746 factual triples,with potential applications in medical data cleaning,medical record retrieve and other tasks.Experiments show that the F1-score of our alignment model reaches 95.78%,and its application in colorectal cancer data mining task can improve the record up to 94%.In addition,a part of this terminology database related to colorectal cancer has been published in dcazb.ecustnlplab.com.
作者
张知行
张佳影
高大启
阮彤
王俊
何萍
姚华彦
ZHANG Zhixing;ZHANG Jiaying;GAO Daqi;RUAN Tong;WANG Jun;HE Ping;YAO Huayan(School of Informaton Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;SimMed,Shanghai 200436,China;Shanghai Hospital Development Center,Shanghai 200041,China;Ruijin Hospital,School of Medicine,Shanghai JiaoTong University,Shanghai 200025,China)
出处
《中文信息学报》
CSCD
北大核心
2020年第12期100-110,共11页
Journal of Chinese Information Processing
基金
国家自然科学基金(61772201)
国家重点研发计划“精准医学研究”重大专项项目(2018YFC0910500)
基于上海区域卫生信息平台的复旦儿科医联体互联网医院项目(201701013)。
关键词
病历挖掘
临床检验指标术语库
术语库融合
medicine record mining
clinic indicator terminology database
terminology database fusion