摘要
采用自标注中文电子病历标准数据集,融合相似度算法与预训练模型并分别应用于实体映射的候选实体生成和实体消歧阶段,对不同相似度算法和预训练模型的性能进行比较分析。提出基于别名间相似性改进药物类实体映射效果的方法,结合Jaccard相似度算法与BERT预训练模型,高效实现海量中文电子病历实体映射任务。
The self-annotated Chinese electronic medical record(EMR)standard datasetisused,the similarity algorithms and pre-trained models are fused and applied to the candidate entity generation and entity disambiguation stages of entity mapping,and the performance of different similarity algorithms and pre-trained models is compared and analyzed.A method is proposed to improve the mapping effect of drug class entities based on alias similarity,and the Jaccard similarity algorithm and BERT pre-trained model are combined to efficiently realize the task of mapping the entities of massive Chinese EMRs.
作者
冯凤翔
任慧玲
李晓瑛
王巍洁
王勖
张颖
FENG Fengxiang;REN Huiling;LI Xiaoying;WANG Weijie;WANG Xu;ZHANG Ying(Institute of Medical Information&Library,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020,China)
出处
《医学信息学杂志》
CAS
2023年第5期45-50,共6页
Journal of Medical Informatics
基金
科技创新2030——“新一代人工智能”重大专项课题“中文医学术语体系构建”(项目编号:2020AAA0104901)。
关键词
实体映射
实体标准化
相似度算法
电子病历
BERT模型
entity mapping
entity standardization
similarity algorithm
electronic medical record(EMR)
BERT model