摘要
如何识别和防范公司财务报表舞弊既是审计工作的一项重要内容,也是当前数据分析领域的热点研究课题。除会计核算方法外,机器学习等众多方法也被广泛应用。本文根据美国证券交易委员会网站发布的会计和审计执法公告选择了45家存在财务报表舞弊行为和47家未被报告存在财务报表舞弊行为的在美上市公司,提取管理层讨论与分析部分的情感词汇频率和财务报表中相关财务特征,利用随机森林、极端随机树、递归神经网络和长短期记忆网络四种算法进行对比分析,综合比较准确率和ROC曲线下面积(AUC)两项指标,极端随机树在财务报表舞弊识别上的效果最好,随机森林算法次之。极端随机树和随机森林算法在训练集上的财务报表舞弊识别准确率分别为80.68%和79.16%,ROC曲线下面积(AUC)分别为84.96%和82.16%,这些数据说明模型分类效果良好。最后通过特征识别研究得到:销售成本、综合开销和行政费用,总股本,应付账款和现金流量比率是极端随机树和随机森林算法训练过程中共有的4项重要特征指标。
How to detect and prevent corporate financial statement fraud is not only an important part of audit work,but also a hot research topic in the field of data analysis.In addition to accounting methods,many machine learning methods are widely used.According to the accounting and audit law enforcement announcements published on the website of the U.S.Securities and Exchange Commission,this paper selects 45 listed companies in the United States with financial statement fraud and 47 companies that have not been reported with financial statement fraud,extracts the emotional vocabulary frequency of the management discussion and analysis part and the relevant financial characteristics in the financial statements,and makes a comparative analysis by using four algorithms:random forest,extreme random tree,recursive neural network,long-term and short-term memory network,comprehensive comparison of accuracy and area under ROC curve(AUC)shows that extreme random tree has the best effect on financial statement fraud detection,followed by random forest algorithm.The accuracy of extreme random tree and random forest algorithm in identifying financial statement fraud on the training set is 80.68%and 79.16%,respectively,and the area under ROC curve(AUC)is 84.96%and 82.16%,respectively.The data show that the classification effect of the model is good.Finally,through the characteristic importance method,it is obtained that sales,general and administrative expenses,total share capital,and accounts payable and cash flow ratio are four important characteristic indexes in the training process of extreme random tree and random forest algorithm.
作者
程建华
田慧敏
CHENG Jianhua;TIAN Huimin(School of Big Data and Statistics,Anhui University,Hefei Anhui 230601,China)
出处
《成都理工大学学报(社会科学版)》
2022年第5期21-32,共12页
Journal of Chengdu University of Technology:Social Sciences
基金
安徽省哲学社会科学规划一般项目(AHSKF2019D019)。
关键词
财务报表舞弊
情感词汇
极端随机树
随机森林
财务特征
Financial Statement Fraud
Emotional Vocabulary
Extreme Random Trees
Random Forest
Financial Characteristics