摘要
文章通过字典法、机器学习和迁移学习的方法对年报情感倾向进行分析。其中,字典法通过构建正向词和负向词字典,并统计年报中正、负向词的占比,作为情感判断的依据。机器学习方法主要涉及随机森林、支持向量回归和LGB,通过构造高词频有序字典,对年报文本数据进行特征抽取,得到词频统计特征矩阵输入机器学习模型,利用年报披露后的月内累计超额收益率获取情感倾向指标,形成情感因子。迁移学习方法主要运用词粒度的中文BERT,针对年报的超长文本构建8大目录特征,并在每个目录特征下运用CogLTX的记忆回想机制对长文本做进一步处理。研究发现,迁移学习产生的情感因子性能最高,机器学习产生的情感因子性能次之,面对超长文本时字典方法构造的情感因子效果较差。
The authors analyze the sentiment tendency of annual reports through the dictionary method,machine learning method and transfer learning method.The dictionary method mainly builds a dictionary of positive words and negative words,and counts the proportion of positive and negative words in annual reports as the basis for sentiment judgment.The machine learning method mainly involves random forest,support vector regression and LGB,The authors use the method of constructing an ordered dictionary with high word frequency to extract the features of the data in annual reports,and obtains the statistical feature matrix of word frequency as the input of the machine learning model,and obtains the sentiment tendency index by predicting the cumulative excess return rate within a month after annual reports are disclosed,forming sentiment factor.The method of transfer learning mainly uses the Chinese BERT with word granularity.For the super long text of annual reports,the authors first construct 8 major catalog features,and uses the MemRecall mechanism in CogLTX under each catalog feature to further processes long texts.The final results show that the sentiment factors generated by transfer learning have the highest performance,followed by the sentiment factors generated by machine learning.In the face of super long texts,the sentimental factors constructed by the dictionary method are less effective.
作者
陈雨涵
周金花
CHEN Yuhan;ZHOU Jinhua(Technology Center,China Baowu Shared Services Co.,Shanghai 200940;Research Office,Shanghai Lixin University of Accounting and Finance,Shanghai 201209)
出处
《上海立信会计金融学院学报》
2024年第1期40-55,共16页
Journal of Shanghai Lixin University of Accounting and Finance
基金
上海市哲学社会科学规划课题(2021BGL004)。
关键词
情感分析
字典法
机器学习
迁移学习
累计超额收益率
金融文本
Sentiment analysis
Dictionary method
Machine learning
Transfer learning
Cumulative abnormal return
Financial text