摘要
【目的】研究创业板上市公司财务造假检测识别问题,构建异常检测模型对公司财务欺诈进行检测和识别。【方法】构建基于数据融合的财务造假异常检测框架,在数据层融合结构化和文本数据、财务及非财务信息的多源异构数据并构造特征,在信息层组合不同的采样和集成分类模型,在知识层融合领域现状构造模型评价指标。【结果】非平衡处理后模型各项评价指标优于未处理的结果,优化后SMOTE+ENN+LightGBM模型的Fβ达到0.7738。此外,包含多种类型特征的检测结果优于仅包含单类特征的检测结果。【局限】本文方法主要用于发掘市场中可疑的财务造假公司,无法区分和判断具体的造假类别。【结论】非平衡处理有利于提升模型对异常样本的识别能力,融合多源异构数据对财务造假的识别有积极作用,为监管部门检测上市公司财务造假提供了参考。
[Objective]This paper builds ensemble models to detect financial frauds of Growth Enterprise Market(GEM)listed companies.[Methods]We constructed a financial fraud anomaly detection framework based on data fusion.In the data layer,we fused structured,text,and multi-source heterogeneous data to construct financial and non-financial information features.In the information layer,we combined different sampling and ensemble classification models.In the knowledge layer,we fused current domain information to construct the model evaluation indicators.[Results]After non-balance processing,the evaluation indicators of the model were better than those of the un-processed results.The optimized SMOTE+ENN+LightGBM model achieved an F_(β) of 0.7738.In addition,the detection results containing multiple types of features were better than those containing only single-class features.[Limitations]The proposed method mainly identifies suspicious financial fraud companies.It cannot distinguish or determine specific types of fraud.[Conclusions]Non-balance processing is beneficial for improving the model’s ability to find abnormal samples,and the fusion of multi-source heterogeneous data positive affects the identification of financial frauds in listed companies.
作者
李爱华
王迪文
续维佳
李子沫
姚思涵
Li Aihua;Wang Diwen;Xu Weijia;Li Zimo;Yao Sihan(School of Management Science and Engineering,Central University of Finance and Economics,Beijing 100081,China)
出处
《数据分析与知识发现》
CSCD
北大核心
2023年第5期33-47,共15页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金项目(项目编号:71932008)
中央高校基本科研业务费专项基金项目(项目编号:20170065)的研究成果之一。
关键词
财务造假
数据融合
异常检测
非平衡数据
Financial Fraud
Data Fusion
Anomaly Detection
Unbalance Data