摘要
本文运用文本挖掘技术,对2008-2018年1297家上市公司年报的管理层讨论与分析(MD&A)进行文本分析。从文本质量特征、文本词汇特征和文本语调特征等角度量化计算文本相似度、文本情感值、文本可读性三个维度文本披露指标,采用Logistic模型、决策树模型、支持向量机和神经网络模型四种方法构建上市公司信用风险预警模型,实证检验加入MD&A文本信息披露指标后信用风险预警模型的预测能力。实证结果表明:(1)在加入文本信息披露指标后,信用风险预警模型的预测准确度得到显著提升,多维度文本信息披露指标比单维度文本信息披露指标对信用风险预警模型预测准确度提升效果更优;(2)Logistic回归模型的预测准确度在样本数量较低时要优于决策树、支持向量机与神经网络,随着样本数量的增加,支持向量机和神经网络的预测准确度会明显提升;(3)不同特征的文本信息内容与企业是否发生信用风险均显著相关。本文的研究结论为提高信用风险预警的预测准确性提供了方法和经验证据,对于投资者与相关学者研究市场有效性提供新的研究视角。
With the economic globalization,the international economic situation becomes more and more complex,and Chinese listed companies will face greater challenges.The unstable economic situation such as trade friction and financial market volatility will increase the credit risk of listed companies.The establishment of credit risk early warning system is conducive to the operators to find the company’s financial problems in time,and make response and prevention.A large number of text documents disclosed by listed companies can extract certain effective information,which can be used as an effective supplement to the traditional quantitative financial indicators.As an important part of the annual report,“Management Discussion and Analysis(MD&A)”in the enterprise annual report includes the evaluation of the company’s historical operation by the company’s managers and the prospect of the future market development.Therefore,deep mining the valuable text information contained in MD&A can effectively supplement the company’s financial index information and predict the company’s credit risk.Natural language text analysis and computer quantitative technology are used to mine the text of Management Discussion and Analysis(MD&A)of 1297 listed companies’annual reports from 2008 to 2018.From the perspective of text quality features,text intonation features and text vocabulary features,the text disclosure indicators of three dimensions are quantified:text similarity,text emotional value and text readability.Logistic model,Support Vector Machine model and Neural Network model are used to build the credit risk early warning model of listed companies.The prediction ability of the credit risk early warning model combined the text disclosure indicators and the financial indicators is empirically tested.The empirical results show that:(1)after adding the text disclosure index,the prediction accuracy of credit risk early warning model has been significantly improved;(2)The prediction accuracy of Logistic regression model is better than decision tree,support vector machine and neural network when the sample size is low.As the number of samples increases,the prediction accuracy of support vector machines and neural networks will increase significantly;(3)The content of textual information with different characteristics is significantly related to whether the enterprise has a credit risk.The research results of this article provide methods and empirical evidence for promoting the prediction accuracy of credit risk early warning,and help the owners of the enterprises to prevent the credit risk.
作者
李成刚
贾鸿业
赵光辉
付红
LI Cheng-gang;JIA Hong-ye;ZHAO Guang-hui;FU Hong(School of Big Data Applications and Economics,Guizhou University of Finance and Economics,Guiyang 550025,China;Guizhou Key Laboratory of Big Data Statistics and Analysis,Guizhou University of Finance and Economics,Guiyang 550025,China;School of Statistics,Tianjin University of Finance and Economics,Tianjin 300000,China;School of Business Administration,Guizhou University of Finance and Economics,Guiyang 550025,China;School of Management,Hefei University of Technology,Hefei 230009,China)
出处
《中国管理科学》
CSSCI
CSCD
北大核心
2023年第2期18-29,共12页
Chinese Journal of Management Science
基金
贵州省大数据统计分析重点实验室(黔科合平台人才[2019]5103)。
关键词
管理层讨论与分析
文本披露质量
文本挖掘
信用风险预警
management discussion and analysis
text disclosure quality
text mining
credit risk warning