摘要
【目的】整合多源涉税数据信息,利用机器学习方法,实现对重点税种涉税违法企业的智能判别分析。【方法】利用网络数据获取、文本挖掘等技术,收集企业财务指标、高管信息、媒体关注信息等多源涉税数据进行融合处理;利用随机森林方法进行特征选择,构建税务稽查选案判别指标体系;利用改进的基于焦点损失函数的多任务结构化稀疏学习方法,视不同税种选案工作为不同任务联合训练,构建了分税种的税务稽查选案判别模型。【结果】真实数据实验结果表明,所提出的基于多任务学习方法构建的税务稽查选案判别模型具有较好的泛化性能和应用能力,其召回率均值达到0.830 9,相对于逻辑回归方法和传统的多任务结构化稀疏学习分别提升了0.135 1和0.103 3。【局限】模型需要在上市企业以外的数据集层面进一步验证。【结论】本研究所构建的模型能够更加精准地甄别出不诚实纳税的目标企业,且可同时识别出其具体涉及的偷漏税税种,为政府智慧税务稽查提供新思路。
[Objective] This paper integrates tax-related data from multiple sources, and uses machine learning methods to identify the illegal corporate tax evasions. [Methods] First, we use web-scraping, text mining, and other methods to collect business financial data, executive information, and media coverage of the corporations.Then, we used the random forest method for feature selection and established indictors for the candidate companies. Then, we built a discriminatory model with the multi-task sparse structure learning based on the improved focal loss function. Finally, we trained the model with different types of tax audits to identify the needed candidates. [Results] We examined our model with real world datasets and found it had good performance for various applications. Its mean recall rate reached 0.830 9, which was 0.135 1 and 0.103 3 higher than the logistic method and the traditional multi-task sparse structure learning. [Limitations] The model needs to be examined with datasets not from the listed companies. [Conclusions] The new model could identify the target enterprises with various dishonest tax evasions. This study provides new directions for smart tax audit by the government.
作者
李国锋
李祚娟
王哲吉
吴梦
Li Guofeng;Li Zuojuan;Wang Zheji;Wu Meng(School of Statistics,Shandong University of Finance and Economics,Jinan 250014,China;School of Economics,Shandong University of Finance and Economics,Jinan 250014,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第6期128-139,共12页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金一般项目(项目编号:19BTJ023)的研究成果之一。