摘要
临床术语标准化是医学文本信息抽取中不可或缺的一项任务。临床上对于同一种诊断、手术、药品、检查、化验、症状等,往往会有多种不同的写法,术语标准化(归一)要解决的问题就是为临床上各种不同的说法找到对应的标准名称。在检索技术生成候选答案的基础上,该文提出了基于BERT(bidirectional encoder representation from transformers)对候选答案进行重排序的方法。实验表明,该方法在CHIP2019手术名称标准化数据集上单模型准确率达到89.1%、融合模型准确率达到92.8%,基本满足实际应用标准。同时该方法具备较好的泛化能力,可应用到其他医学种类术语的标准化任务上。
Clinical term normalization is an indispensable task in clinical text information extraction. There are often various ways of writing about the same clinical term like diagnosis, operation, medicine, examination, laboratory test, symptom, etc., and term normalization is to find the corresponding standard name for different clinical terms. Based on the candidate answers generated by information retrieval tools, this paper proposes a method of reordering candidates based on BERT(Bidirectional Encoder Representation from Transformers). The experimental results show that the accuracy of single model and fusion model achieves 89.1% and 92.8%, respectively.
作者
陈漠沙
仇伟
谭传奇
CHEN Mosha;QIU Wei;TAN Chuanqi(Machine Intelligence Technology,Alibaba,Hangzhou,Zhejiang 311121,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第3期88-93,共6页
Journal of Chinese Information Processing
关键词
手术名称标准化
Lucene检索
BERT
clinical operation term normalization
Lucene information retrieval
BERT