摘要
针对在金融领域实体级情感分析任务中缺乏足够的标注语料,以及通用的情感分析模型难以有效处理金融文本等问题,该文构建一个百万级别的金融领域实体情感分析语料库,并标注5000余个金融领域情感词作为金融领域情感词典。同时,基于该金融领域数据集,提出一种结合金融领域情感词典和注意力机制的金融文本细粒度情感分析模型(FinLexNet)。该模型使用两个LSTM网络分别提取词级别的语义信息和基于情感词典分类后的词类级别信息,能有效获取金融领域词语的特征信息。此外,为了让文本中金融领域情感词获得更多关注,提出一种基于金融领域情感词典的注意力机制来为不同实体获取重要的情感信息。最终在构建的金融领域实体级语料库上进行实验,取得了比对比模型更好的效果。
To address the entity-level sentiment analysis of financial texts,this paper builds a multi-million level corpus of sentiment analysis of financial domain entities and labels more than five thousand financial domain sentiment words as financial domain sentiment dictionary.We further propose an Attention-based Recurrent Network Combined with Financial Lexicon,called FinLexNet.FinLexNet model uses a LSTM to extract category-level information based on financial domain sentiment dictionary and another LSTM to extract semantic information at the word-level.In addition,in order to get more attention to the financial sentiment words,an attention mechanism based on the financial domain sentiment dictionary is proposed.Finally,experiments on the dataset we constructed shows that our model has achieved better performance than the baseline models.
作者
祝清麟
梁斌
徐睿峰
刘宇瀚
陈奕
毛瑞彬
ZHU Qinglin;LIANG Bin;XU Ruifeng;LIU Yuhan;CHEN Yi;MAO Ruibin(Department of Computer Science,Harbin Institute of Technology(Shenzhen),Shenzhen,Guangdong 518055,China;Shenzhen Securities Information Co.Ltd.,Shenzhen,Guangdong 518028,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第8期109-117,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(61876053,62006062)
深圳市技术攻关项目(JSGG20210802154400001)
深圳市基础研究学科布局项目(JCY20210324115614039)。
关键词
细粒度情感分析
金融文本
金融情感词典
fine-grained sentiment analysis
financial texts
financial sentiment lexicon