期刊文献+

基于层次化表示的电力文本命名实体识别和匹配算法 被引量:2

Hierarchical Representation of Power Text Named Entity Recognition and Project-expert Matching
下载PDF
导出
摘要 针对电力领域科技项目申请书评审工作中存在的项目与专家精准匹配难的问题,提出一种基于层次化语义表示的电力文本命名实体识别模型(Attention-RoBerta-BiLSTM-CRF,ARBC)以及基于语义-象形双特征空间映射的电力项目与电力专家的匹配策略。ARBC模型包括词嵌入模块、双向长短时记忆网络BiLSTM模块以及条件随机场(CRF)模块。其中,词嵌入模块同时使用了电力文本词语、句子和文档3个层面的信息。具体地,首先提取基于RoBerta预训练模型的词嵌入向量,进而通过引入文档层面基于词频-逆文档频率值的注意力机制增强句子的上下文表征能力,最终将词嵌入与句子嵌入进行线性加权融合,形成词语的层次化表征向量。在ARBC模型输出电力文本命名实体基础之上,进一步提出基于语义-象形双特征空间映射的项目文本与领域专家的实体匹配策略,最终实现项目与专家的有效精准匹配任务。实验结果表明,ARBC模型在2000篇电力项目摘要文本命名实体识别测试集上获得83%的F1值,显著高于基于Bert和RoBerta的文本表示方法。此外,基于双特征空间映射的实体匹配策略在电力文本与电力专家匹配任务中准确率达85%。 To address the project-expert matching problem existing in the evaluation work of the application for science and technology projects in the power field,this paper proposes a novel hierarchical word representation model(Attention-RoBerta-BiL-STM-CRF,ARBC)for power text named entity recognition.Moreover,a project-expert matching algorithm is also presented based on semantic and pictorial double feature space mapping strategy.ARBC model consists of a word embedding module,a Bidirectional Long Short-Term Memory(BiLSTM)module and a Conditional Random Field(CRF)module.The hierarchical word embedding module utilizes the information of word,sentence and document of the power text.Specifically the word embedding vector based on the pre-trained RoBerta model is extracted firstly.Then,the contextual representation of any sentence is enhanced by introducing an attention mechanism based on word frequency-inverse document frequency values at the document level.Finally,the word embedding and sentence embedding are linearly weighted and fused to form a hierarchical representation vector of a given word.Once the named entities of power texts are recognized by ARBC model,the task of entity effetive accurate matching between power projects and experts is achieved by the semantic and pictorial double feature space mapping strategy.Experimental results demonstrated on a set of 2000 power project abstract texts for the task of named entities recognition,and a Fl value of 83%is achieved based on the ARBC model,which is significantly higher than the widely used pre-trained models such as Bert and RoBerta.In addition,the entity matching strategy based on double feature space mapping achieves 85%accuracy for the power text-expert matching task.
作者 杨政 蔡迪 李慧斌 YANG Zheng;CAI Di;LI Hui-bin(Electric Power Research Institute of Yunnan Power Grid Co., Ltd., Kunming 650217, China;School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China)
出处 《计算机与现代化》 2022年第5期75-81,共7页 Computer and Modernization
基金 国家自然科学基金面上项目(61976173) 教育部——中国移动人工智能建设项目(MCM20190701)。
关键词 层次化表示 命名实体识别 专家匹配 电力文本 hierarchical representation named entity recognition expert matching power text
  • 相关文献

参考文献2

二级参考文献27

共引文献188

同被引文献15

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部