期刊文献+

结合上下文词汇匹配和图卷积的材料数据命名实体识别

Material data named entity recognition based on matching contextual lexical words and graph convolution
下载PDF
导出
摘要 材料领域的文献中蕴含着丰富的知识,利用机器学习和自然语言处理等手段对文献进行数据挖掘是研究热点.命名实体识别(named entity recognition,NER)是高效利用挖掘和抽取数据中信息的首要步骤.为了解决现有实体识别方法中存在的向量表示无法解决一词多义、模型常提取上下文特征而忽略全局特征等问题,提出了一种基于上下文词汇匹配和图卷积命名实体识别方法.该方法首先利用XLNet获取文本的上下文动态特征,其次利用长短期记忆网络并结合文本上下文匹配词汇的图卷积神经网络(graph convolutional network,GCN)模型分别获取上下文特征与全局特征,最终经过条件随机场输出标签序列.2种不同语料对模型进行验证的结果表明,该方法在材料数据集上的精确率、召回率和F1值分别达到90.05%、88.67%和89.36%,可有效提升命名实体识别的准确率. Literature pertaining to materials contain abundant information regarding data mining using machine learning and natural language processing,which is currently being investigated extensively.Named entity recognition(NER)is first performed when mining and extracting information from data such that the data can be used efficiently.As vector representation cannot solve multiple meanings of words,and models often extract contextual features while disregarding global features,a named entity recognition method based on matching contextual lexical words and graph convolution is proposed herein.First,the contextual dynamic features of text is obtained using XLNet;second,the contextual and global features are obtained using a long short-term memory network and a graph convolutional network(GCN)combined with contextual lexical words of the text,respectively.Finally,a sequence of labels is output via a conditional random field.The model is validated using two different datasets.Experimental results of the material data show that the precision,recall,and F1 score are 90.05%,88.67%,and 89.36%,respectively,which effectively improve the named entity recognition accuracy.
作者 陈茜 武星 CHEN Qian;WU Xing(School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;Center of Materials Informatics and Data Science,Materials Genome Institute,Shanghai University,Shanghai 200444,China;Zhejiang Laboratory,Hangzhou 311100,Zhejiang,China)
出处 《上海大学学报(自然科学版)》 CAS CSCD 北大核心 2022年第3期372-385,共14页 Journal of Shanghai University:Natural Science Edition
基金 国家重点研发计划资助项目(2018YFB0704400) 云南省重大科技专项资助项目(202102AB080019-3,202002AB080001-2) 之江实验室科研攻关资助项目(2021PE0AC02) 上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)。
关键词 命名实体识别 XLNet 图卷积神经网络 named entity recognition(NER) XLNet graph convolutional network(GCN)
  • 相关文献

参考文献4

二级参考文献19

共引文献88

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部