摘要
【目的】针对命名实体边界识别困难问题,融入词信息以改进在线问诊记录中临床关键特征的识别与推断。【方法】基于MacBERT与条件随机场构建模型,对词位置和词性等词信息进行位置“软”嵌入,利用说话者角色嵌入引入对话文本信息。同时,引入加权多分类交叉熵解决实体类别不均衡问题。【结果】在春雨医生互联网在线问诊记录上开展实证研究,本文模型在命名实体识别任务上的F_(1)值为74.35%,相比直接利用MacBERT模型提高近2个百分点。【局限】未设计专门对中文分词的模型。【结论】与直接利用MacBERT模型建模相比,融入词信息等更多维度特征能有效提升模型的识别能力。
[Objective]This paper utilizes the word information to identify and infer the key clinical features in online consultation records and address the difficulty in recognizing the boundaries of named entities.[Methods]First,we constructed a new model based on MacBERT and conditional random fields.Then,we embedded the word position and part of speech as the dialogue text information by the speaker role embedding.Finally,we used the weighted multi-class cross-entropy to solve the problem of entity category imbalance.[Results]We conducted an empirical study with online consultation records from Chunyu Doctor.The F_(1) value of the proposed model in the named entity recognition task was 74.35%,which was nearly 2% higher than directly using the MacBERT model.[Limitations]We did not design a specific model for Chinese word segmentation.[Conclusions]Our new model with more dimensional features can effectively improve its ability to recognize key features of clinical findings.
作者
本妍妍
庞雪芹
Ben Yanyan;Pang Xueqin(School of Mathematics and Statistics,Huazhong University of Science and Technology,Wuhan 430074,China;Archives of Wuhan University of Science and Technology,Wuhan 430081,China)
出处
《数据分析与知识发现》
CSCD
北大核心
2023年第5期123-132,共10页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金项目(项目编号:11971185)的研究成果之一。
关键词
中文命名实体识别
在线医疗问诊
词信息融入
MacBERT
加权交叉熵
Chinese Named Entity Recognition
Online Medical Consultation
Word Information Embedding
MacBERT
Weighted Cross Entropy