摘要
临床术语标准化任务是医学统计中不可或缺的一部分。在实际应用中,一个标准的临床术语可能有数种口语化和非标准化的描述,而对于一些应用例如临床知识库的构建而言,如何将这些描述进行标准化是必须要面对的问题。该文主要关注中文临床术语的标准化任务,即将非标准的中文临床术语的描述文本和给定的临床术语库中的标准词进行对应。尽管一些深度判别式模型在简单文本结构的医疗术语,例如,疾病、药品名等的标准化任务上取得了一定成效,但对于中文临床术语标准化任务而言,其带标准化的描述文本中经常包含的信息缺失、"一对多"等情况,仅依靠判别式模型无法得到完整的语义信息,因而导致模型效果欠佳。该文将临床术语标准化任务类比为翻译任务,引入深度生成式模型对描述文本的核心语义进行生成并得到标准词候选集,再利用基于BERT的语义相似度算法对候选集进行重排序得到最终标准词。该方法在第五届中国健康信息处理会议(CHIP2019)评测数据中进行了实验并取得了很好的效果。
Clinical entity normalization is an indispensable part of medical statistics. In practice, a standard clinical term entity has several kinds of colloquialisms and non-standardized mentions, and for some applications such as the a clinical knowledge base construction, how to normalize these mentions is an issue that has to address. This paper is focused on the Chinese clinical entity normalization, i.e., linking non-standard Chinese clinical entity to the standard words which are in the given clinical terminology base. Specifically, we treat the clinical entity normalization task as a translation task, and employ a deep learning model to generate the core semantics of the clinical mentions and obtain the candidate set of the standard entity. The final standard words were obtained by re-ranking the candidate set by using a BERT-based semantic similarity model. Experiments on the data of the 5 th China Conference on Health Information Processing(CHIP2019) achieve good results.
作者
闫璟辉
向露
周玉
孙建
陈思
薛晨
YAN Jinghui;XIANG Lu;ZHOU Yu;SUN Jian;CHEN Si;XUE Chen(School of Computer Science and Information Technology&Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;Beijing Fanyu Technology Co.Ltd,Beijing 100080,China;Fanyu AI Research,Beijing 100080,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第5期77-85,共9页
Journal of Chinese Information Processing
关键词
术语标准化
核心语义
生成式模型
entity normalization
core semantics
generative model