基于位置特征和句法依存树的可度量数量信息抽取模型

Extraction Model of Measurable Quantitative Information Based on Position Feature and Dependency Tree

下载PDF

导出

摘要随着医疗信息化水平的不断提高,电子病历得到了越来越广泛的应用,其中的非结构化文本包含大量蕴含患者病况信息的可度量数量信息,由于实体与数量信息表述的复杂性,从非结构化电子病历文档中精准抽取可度量数量信息是一个重要的挑战.本文基于双向门控循环单元提出了结合相对位置特征与注意力机制的RPA-GRU模型,通过将相对位置特征融入注意力机制更新双向门控循环单元输出,识别实体与数量信息.并基于重构句法依存树的图注意力网络学习图级表示提出GATM模型,实现实体与数量信息的关联.实验基于1359份三甲医院烧伤科电子病历数据,结果表明RPA-GRU模型与GATM模型在可度量数量信息识别和关联上分别获得97.58%与97.86%的F1值,比表现最好的基线模型分别高出2.17%与1.74%,验证了所提出模型的有效性. As medical informatization is constantly improving,electronic medical records have been more and more widely used,of which the unstructured text contains massive measurable quantitative information including patient clinical conditions.Due to the complexity of entities and quantitative information,it is a challenge to accurately extract measurable quantitative information.In this study,we propose the RPA-GRU model combining the relative position feature and attention mechanism based on a bi-directional gated recurrent unit.It incorporates the relative position feature into the attention mechanism to identify entities and quantity information.Meanwhile,the GATM model is proposed according to the reconstructed dependency tree-based graph attention network to learn graph-level representation,thus achieving the association between entities and quantity information.The experiment is based on 1359 electronic medical records from the burn injury department of a three-A hospital.The results show that the F1 values of RPA-GRU model and GATM model are 97.58%and 97.86%respectively in terms of identification and association of measurable quantitative information,up by 2.17%and 1.74%compared with the best-performing baseline model.In this way,the effectiveness of the proposed models is validated.

作者聂文杰莫迪黄邦锐刘海郝天永 NIE Wen-Jie;MO Di;HUANG Bang-Rui;LIU Hai;HAO Tian-Yong(Shool of Computer Science,South China Normal University,Guangzhou 510631,China;School of Artificial Intelligence,South China Normal University,Foshan 528225,China)

机构地区华南师范大学计算机学院华南师范大学人工智能学院

出处《计算机系统应用》 2022年第10期279-287,共9页 Computer Systems & Applications

基金广东自然科学基金(2021A1515011339)

关键词可度量数量信息电子病历相对位置特征句法依存树图注意力网络信息抽取 measurable quantitative information electronic medical records relative position feature dependency tree graph attention networks information extraction

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1肖洪,薛德军.基于大规模真实文本的数值知识元挖掘研究[J].计算机工程与应用,2008,44(30):150-152. 被引量：19
2张桂平,张宁,白宇.面向问答的数值信息抽取[J].郑州大学学报（理学版）,2018,50(4):21-25. 被引量：3
3王竣平,白宇,蔡东风.采用BI-LSTM-CRF模型的数值信息抽取[J].计算机应用与软件,2019,36(5):138-144. 被引量：9
4商金秋,朱卫国,樊银亭,李伟亨,马翠霞,滕东兴.基于电子病历可视分析的临床诊断模型[J].计算机系统应用,2016,25(12):100-107. 被引量：4

二级参考文献27

1常青.文本挖掘挖掘知识[J].中国计算机用户,2004(24):49-50. 被引量：10
2温有奎,温浩,徐端颐,潘龙法.基于知识元的文本知识标引[J].情报学报,2006,25(3):282-288. 被引量：64
3封春升,郝爱民.基于模式匹配的自然语言识别[J].计算机工程与应用,2006,42(19):144-146. 被引量：8
4Etzioni O,Cafarella M,Downey D.Web-scale information extraction in knowItAll(premaliminary results)[C]//Proceedings of the 13th International Conference on World Wide Web(WWW2004),New York, 2004.
5Banko M,Cafarella,M J.Open information extraction from the Web[C]// Proceedings of the 20th International Joint Conferences on Artificial Intelligence,2007.
6Pasca M,de Kang Lin.Organizing and searching the World Wide Web of facts-step one:the one-million fact extraetion challenge[C]// Proeeedings of the 21st National Conferenee on Artificial Intelligence, 2006.
7Pasta M,de Kang Lin.Names and similarities on the Web:fast extraction in the fast lane[C]//Processings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney,2006 : 809-816.
8Google-onebox-搜索引擎周边[EB/OL].[2007-10-22].http://www.eryi.org/catalog.asp?tags=google-onebox.
9中国年鉴全文数据库[EB/OL].[2007-09-22].http://www.cnki.net/logirdautonavi.aspx?id=7.
10肖洪,薛德军.基于大规模真实文本的数值知识元挖掘研究[J].计算机工程与应用,2008,44(30):150-152. 被引量：19

共引文献30

1徐霞军,秦绪涛,杨强,朱云飞.大数据技术在核电设备缺陷分析中的初步应用[J].核动力工程,2020,41(S01):68-72. 被引量：5
2王洪建.中小学知识元学习平台设计与建设[J].中小学电教（综合）,2009(9):64-66. 被引量：2
3汤华波,颜慧超,王钊.知识管理系统工作平台开发[J].软件导刊,2010,9(12):92-94. 被引量：3
4杨建林.知识表示与知识相关性度量研究[J].情报理论与实践,2011,34(5):1-5. 被引量：5
5丁侃,柳长华.国内知识元相关研究现状[J].数字图书馆论坛,2011(12):72-78. 被引量：4
6刘畅,尚航标.森工企业应急管理:面向突发事件的知识元链接技术[J].林业经济,2013(7):112-115.
7吴超,郑彦宁,化柏林.数值信息抽取研究进展综述[J].中国图书馆学报,2014,40(2):107-119. 被引量：10
8张娟,陈人语.语义网背景下基于单元信息的知识组织框架研究[J].国家图书馆学刊,2018,27(6):54-59. 被引量：5
9高国伟,王亚杰,李永先.知识元表示方法研究[J].现代情报,2015,35(3):15-18. 被引量：8
10毕崇武,王忠义,宋红文.基于知识元的数字图书馆多粒度集成知识服务研究[J].图书情报工作,2017,61(4):115-122. 被引量：50

1顾娟,王丽萍.盐城市肥胖青少年健康行为干预信息需求指数的研究[J].中文科技期刊数据库（引文版）医药卫生,2022(10):77-80.
2吴朝晖.新形势下探究制造企业内部控制的完善措施[J].质量与市场,2022(24):127-129. 被引量：1

计算机系统应用

2022年第10期

浏览历史

内容加载中请稍等...

基于位置特征和句法依存树的可度量数量信息抽取模型

参考文献4

二级参考文献27

共引文献30

相关作者

相关机构

相关主题

浏览历史