期刊文献+

基于自然语言处理的医学实体识别与标签提取 被引量:9

Medical Entity Recognition and Label Extraction Based on Natural Language Processing
下载PDF
导出
摘要 随着信息化建设的快速发展,数据产生了爆炸式的增长,医院每天也同样产生大量的医疗记录与数据。其中大部分内容是非结构化数据,具有真实性、主观性和不规范性,不利于解读和处理。由于医疗数据是以非结构化的文本形式存储的,因此无法直接通过计算机直接处理和分析,不仅效率低下,分析质量也无法保证。目前的信息抽取研究中使用的方法的可扩展性都较差,具有一些局限性,故自动化程度不高。文中通过自然语言处理中的规则描述语言方法,对数据中非结构化的医学命名实体进行识别,并通过语义分析进行标签提取,使非结构化的数据结构化,让数据中的描述更为准确、统一。优化了目前信息抽取方法中存在的可扩展性差的缺点,能够根据情况适应不同的情景。 With the rapid development of information construction,data has exploded. Hospitals also produce a large number of medical records and data every day. Most of them are unstructured data with authenticity,subjectivity and irregularity,which is not conducive to interpretation and processing. Since medical data is stored in the form of unstructured text,it cannot be directly processed and analyzed by computer,which is not only inefficient,but also cannot guarantee the quality of analysis. At present,the methods used in information extraction research have poor scalability and some limitations,so the degree of automation is not high. We recognize unstructured medical named entities in data by rule description language method in natural language processing,and extract labels by semantic analysis,so that unstructured data can be structured to make the description of data more accurate and unified. It also optimizes the shortcomings of poor scalability in current information extraction methods,and can adapt to different scenarios according to the situation.
作者 赵君珂 张振宇 蔡开裕 ZHAO Jun-ke;ZHANG Zhen-yu;CAI Kai-yu(National University of Defense Technology,Changsha 410073,China)
机构地区 国防科技大学
出处 《计算机技术与发展》 2019年第9期18-23,共6页 Computer Technology and Development
基金 国家自然科学基金(61572514) 长沙市科技局项目(K1705007)
关键词 自然语言处理 医学数据 非结构化 实体识别 标签提取 natural language processing medical data unstructured entity identification label extraction
  • 相关文献

参考文献13

二级参考文献68

共引文献227

同被引文献71

引证文献9

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部