期刊文献+

基于分步降维框架的股吧短文本数据关键信息抽取与个股收益预测

Feature Extraction from Guba Short-text Messages and Stock Return Prediction:A Multistep Dimension Reduction Framework
原文传递
导出
摘要 社交媒体是股市投资者获取金融信息的重要来源,其中携带的情感等信号与股价走势关系密切.但社交媒体信息表达随意,口语化严重,长度较短且语义密度低,使得基于情感词典的传统文本分析方法存在损失有价值词汇的风险.近期Fan et al.(2021)提出的分步降维框架试图通过充分利用文本本身的语义特征以提升关键信息抽取的精准性.本文将这一框架扩展至社交媒体情景,以系统探索东方财富网股吧短文本是否为个股价格走势提供有效的前导信息.具体而言,首先使用主成分分析方法提取文本中的公共因子,继而对残差矩阵依次实施变量扫描进一步过滤信息,再采用Lasso回归构建预测模型,从而在实现降维的基础上,更大程度地挖掘文本中蕴含的面向个股的独特价值语义.结果表明该框架能够较好地从股吧短文本中抽取预测个股收益的信息.此外,其识别出的具备预测能力的词汇集合也体现了社交媒体短文本不同于其他金融文本的特点,且与传统的金融情感词典差异较大.因此,该分步降维框架为分析社交媒体短文本数据提供了新思路. Social media is an important source for stock market investors to obtain financial information,where the emotional and other relevant signals contained in users’posts are closely related to the stock prices.However,since these posts are mostly freely expressed,highly colloquial,short in length and extremely low in semantic density,the traditional text analysis method based on sentiment dictionary faces with the risk of losing valuable information.The recent multi-step dimension reduction framework proposed by Fan et al.(2021)attempts to improve the preciseness of substantial information extraction from text data by making full use of the semantic features within the text in a data-driven way.This paper extends this framework to a social media scenario and systematically explores whether the short texts on Eastmoney Guba provide effective leading information for individual stock prices.Specifically,the principal component analysis method is used to extract common factors in the text,and then variable screening is performed on the residual matrix to further filter features of words in the text.Then Lasso regression is used to build a prediction model,by which the unique semantics for individual stocks contained in the text are extracted.The results show that the framework can indeed extract the useful information from the short texts in Guba for individual stock returns prediction.In addition,the identified vocabulary sets with predictive power also reflect the characteristics of social media short texts,which are different from not only other financial texts but also the traditional financial sentiment dictionaries.Therefore,this multi-step dimension reduction framework provides a new path for leveraging social media short-text data in various domains.
作者 卢珊 王惠文 赵吉昌 LU Shan;WANG Huiwen;ZHAO Jichang(School of Statistics and Mathematics,Central University of Finance and Economics,Beijing 100081,China;School of Economics and Management,Beihang University,Beijing 100191,China;Key Laboratory of Complex System Analysis,Management and Decision(Beihang University),Ministry of Education,Beijing 100191,China)
出处 《计量经济学报》 CSCD 2023年第3期707-721,共15页 China Journal of Econometrics
基金 国家自然科学基金(72021001,72001222,71871006)。
关键词 社交媒体 短文本数据 主成分分析 变量筛选 股价预测 social media short-text data principal component analysis variable selection stock return prediction
  • 相关文献

参考文献10

共引文献302

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部