期刊文献+

增强语义表示的中文金融评价要素抽取 被引量:1

Enhance Semantic Representation for Opinion Elements Extraction from Chinese Financial Text
下载PDF
导出
摘要 中文金融评价文本是了解金融行情和判断金融行业繁荣程度的主要载体,对其中的评价要素进行抽取和分析可以在一定程度上的帮助决策者做出判断.传统的抽取方法更侧重于寻找规则,工作量大,且在句子复杂或者不规范的情况下,难以充分考虑句子的句法特征.为了解决该问题,本文构建BBG-BMC模型,利用基于图自注意力机制的混合词编码模型BBG(BERT-BiLSTM-GAT)进行词语编码,在经典的BiLSTM-CRF模型中增加自注意力机制(BiLSTM-多头自注意力机制-CRF,BMC)进行序列标注.该模型的特点是:1)通过图自注意力网络(GAT)建模并利用词语之间的句法依存关系,增强词语语义学习;2)融合词语的上下文信息、词语的局部语义信息、词语之间的句法关系信息,弥补金融词汇在BERT预训练模型上语义表示不够充分的问题;3)对评价单元的三要素<评价对象、情感程度、评价词>联合抽取,扩大评价单元抽取的应用场景;4)使用序列标注的评价单元抽取思想,以混合词编码(BBG)、双向长短期记忆网络(BiLSTM)、条件随机场(CRF)与多头自注意力机制(MHSA)为组件,提升评价单元抽取效果.在中文金融文本数据集上对BBG-BMC模型进行评测,结果表明,本文的模型比最先进的模型BiLSTM-CRF取得了6.75%的F1值提升. Chinese financial text is an important source for understanding financial market and judging the prosperity of financial industry.Extracting and analyzing target-opinion elements can help decision-makers to make judgments to a certain extent.Traditional extraction methods focus more on finding rules,which not only requires a lot of work,but also makes it difficult to fully consider the syntactic features of sentences when the sentences are complex or irregular.To solve this problem,we construct BBG-BMC,where BBG(BERT-BiLSTM-GAT)is a mixed word encoding module based on multilayer graph self-attention and BiLSTM-CRF is a sequence labeling module using self-attention mechanism.Our contributions can be summarized as follows:1)The syntactic dependency between words is used though GAT to enhance the semantic learning of words.In this way,we not only fully consider the semantic expression habit of the Chinese word unit,but also learn grammatical and semantic dependence among the elements of target-opinion unit.2)We integrate the context,inner semantic and dependence relationship of words.This process could compensate for the insufficiency of BERT on financial terms.3)We extract three elements of the target-opinion unit,namely opinion target,opinion degree and opinion words.This can expand the application scenarios of target-opinion unit extraction.4)BBG-BMC uses sequence labeling model,with mixed and enhanced word coding BBG,Bi-directional Long Short-Term Memory(BiLSTM),Conditional Random Field(CRF)and Multi-headed Self-attention as components.Extensive experiments are conducted on the Chinese Financial Text Dataset,which show that BBG-BMC achieves 6.75%improvement over advanced model BiLSTM-CRF in terms of F1.
作者 陈启 刘德喜 万常选 刘喜平 鲍力平 CHEN Qi;LIU De-xi;WAN Chang-xuan;LIU Xi-ping;BAO Li-ping(School of Information Management,Jiangxi University of Finance and Economics,Nanchang 330032,China;Jiangxi Key Laboratory of Data and Knowledge Engineering,Jiangxi University of Finance and Economics,Nanchang 330013,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2022年第2期254-262,共9页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61762042,61972184,62076112)资助。
关键词 中文金融评价文本 评价要素抽取 图自注意力网络 双向长短期记忆网络 Chinese financial text target-opinion elements extraction graph self-attention networks bi-directional long short-term memory
  • 相关文献

参考文献6

二级参考文献19

共引文献161

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部