摘要
数字时代的来临使利用机器学习识别财务舞弊成为研究的热点。在采用原始财务数据的基础上,引入财务比率、公司治理指标、审计指标和我国资本市场特殊指标,并以Logistic模型为评价基准,分别运用决策树、随机森林、Adaboost决策树和支持向量机(SVM)模型进行机器学习分析,并利用采样的方式降低样本不平衡性,以召回率(recall)为评价各模型的标准,综合运用准确率(accuracy)、召回率和AUC判断模型和数据的优劣。研究发现:加入财务比率、审计指标和我国资本市场特殊要素指标的模型能够得到较优的识别效果,而公司治理指标并不能提高模型的舞弊识别能力;与其他模型相比,随机森林模型和Adaboost-决策树模型具有更好的舞弊识别效果。
With the advent of the digital era,the use of machine learning to identify financial fraud has become a hot topic of research.Based on the original financial data,the financial ratio,corporate governance indicators,audit indicators and special indicators of China’s capital market,this paper uses the Logistic model(M-Score,F-Score and C-Score)as the evaluation benchmark to conduct machine learning analysis with the decision tree,random forest,Adaboost-decision tree and support vector machine(SVM)models.The sample imbalance is reduced by oversampling,the recall rate is used as the standard to evaluate each model,and accuracy,recall rate and AUC are utilized to judge the model and data.The study has found that the model with the financial ratio,audit indicators and special indicators of China’s capital market can obtain better recognition effect,while corporate governance indicators cannot improve the fraud recognition ability of the model.Compared with other models,the random forest model and the Adaboost-decision tree model have better fraud iden⁃tification effect,with the accuracy rate reaching 62%and 64%and the recall rate hitting 64%and 62%,respectively.
作者
于李胜
郑天宇
滕传浩
YU Li-sheng;ZHENG Tian-yu;TENG Chuan-hao(Center for Accounting Studies,Xiamen University,Xiamen 361005,Fujian;School of Management,Xiamen University,Xiamen 361005,Fujian)
出处
《厦门大学学报(哲学社会科学版)》
CSSCI
北大核心
2023年第2期45-56,共12页
Journal of Xiamen University(A Bimonthly for Studies in Arts & Social Sciences)
基金
国家自然科学基金面上项目“信息披露对实体经济发展的影响机制研究”(71972161)
国家自然科学基金面上项目“会计准则视角下银行系统性风险影响机制研究(71972162)”
校长基金创新团队项目(20720191087)。
关键词
数字经济
机器学习
舞弊识别
上市公司
digital economy
machine learning
fraud identification
public company