基于多任务学习的税务稽查选案研究被引量：1

Identifying Tax Audit Cases with Multi-task Learning

导出

摘要【目的】整合多源涉税数据信息,利用机器学习方法,实现对重点税种涉税违法企业的智能判别分析。【方法】利用网络数据获取、文本挖掘等技术,收集企业财务指标、高管信息、媒体关注信息等多源涉税数据进行融合处理;利用随机森林方法进行特征选择,构建税务稽查选案判别指标体系;利用改进的基于焦点损失函数的多任务结构化稀疏学习方法,视不同税种选案工作为不同任务联合训练,构建了分税种的税务稽查选案判别模型。【结果】真实数据实验结果表明,所提出的基于多任务学习方法构建的税务稽查选案判别模型具有较好的泛化性能和应用能力,其召回率均值达到0.830 9,相对于逻辑回归方法和传统的多任务结构化稀疏学习分别提升了0.135 1和0.103 3。【局限】模型需要在上市企业以外的数据集层面进一步验证。【结论】本研究所构建的模型能够更加精准地甄别出不诚实纳税的目标企业,且可同时识别出其具体涉及的偷漏税税种,为政府智慧税务稽查提供新思路。 [Objective] This paper integrates tax-related data from multiple sources, and uses machine learning methods to identify the illegal corporate tax evasions. [Methods] First, we use web-scraping, text mining, and other methods to collect business financial data, executive information, and media coverage of the corporations.Then, we used the random forest method for feature selection and established indictors for the candidate companies. Then, we built a discriminatory model with the multi-task sparse structure learning based on the improved focal loss function. Finally, we trained the model with different types of tax audits to identify the needed candidates. [Results] We examined our model with real world datasets and found it had good performance for various applications. Its mean recall rate reached 0.830 9, which was 0.135 1 and 0.103 3 higher than the logistic method and the traditional multi-task sparse structure learning. [Limitations] The model needs to be examined with datasets not from the listed companies. [Conclusions] The new model could identify the target enterprises with various dishonest tax evasions. This study provides new directions for smart tax audit by the government.

作者李国锋李祚娟王哲吉吴梦 Li Guofeng;Li Zuojuan;Wang Zheji;Wu Meng(School of Statistics,Shandong University of Finance and Economics,Jinan 250014,China;School of Economics,Shandong University of Finance and Economics,Jinan 250014,China)

机构地区山东财经大学统计学院山东财经大学经济学院

出处《数据分析与知识发现》 CSSCI CSCD 北大核心 2022年第6期128-139,共12页 Data Analysis and Knowledge Discovery

基金国家社会科学基金一般项目(项目编号:19BTJ023)的研究成果之一。

关键词多源数据融合智慧税务稽查多任务结构化稀疏学习焦点损失函数 Multi-source Data Fusion Smart Tax Audit Multi-task Sparse Structure Learning Focal Loss

分类号 F812 [经济管理—财政学]