期刊文献+

基于多标签CRF的疾病名称抽取 被引量:10

Multi-label CRF based method for disease extraction
下载PDF
导出
摘要 生物医疗文本中的命名实体识别对于构建和挖掘大型临床数据库以服务于临床决策具有重要意义,而其中一个基础工作是疾病名称的识别。医疗文本中存在大量的复合疾病名称,难以分离抽取出其中的实体。针对这一问题,提出一种基于多标签的条件随机场算法,首先对数据标注多层标签,每层标签针对复合疾病名称中的不同疾病,然后用整合后的最终标签去训练模型,最后再对模型预测的标签进行分离。此方法能够识别传统条件随机场算法无法识别的复合疾病名称,实验结果验证了所提算法的有效性。 Named entity recognition in medical text for building and digging large clinical database to serve the clinical decision is of great significance, and one of the important basic work is to be able to accurately identify the name of the disease. There are a large number of compound disease name in the medical texts. In order to solve this problem, this paper proposed a kind of CRF algorithm based on multi-label, first of all, it put muhilayer labels to the data, labels on each floor for different diseases, and then integrated into an end label to training model, finally, it isolated each layer label from the model predicts result, and then identified the diseases. This method can recognize composite disease name which cannot be identified by the traditional CRF algorithm. The experimental results verify the effectiveness of the proposed algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2017年第1期118-122,共5页 Application Research of Computers
基金 国家自然科学基金重点资助项目(61133012) 国家自然科学基金资助项目(61173062) 国家哲学社会科学重大计划招标项目(11&ZD189)
关键词 命名实体识别 条件随机场 多标签 医疗文本 复合实体 named entity recognition conditional random fields multi-label medical text composite entity
  • 相关文献

同被引文献77

引证文献10

二级引证文献107

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部