摘要
企业偷漏税手段呈多样化、专业化、普遍化、隐蔽化趋势。为更好地识别企业是否存在偷漏税行为,基于Python语言,在Anaconda集成开发环境下通过机器学习的Scikit-Learn包构建随机森林模型,通过交叉验证算法选择最优模型。对汽车销售行业纳税人是否存在偷漏税行为进行自动识别。结果表明,基于随机森林模型的偷漏税行为自动识别相对于其它常见的分类模型(k-近邻算法、逻辑回归模型、决策树模型和Adaboost算法),具有较高的准确率,分类性能更好,可以满足偷漏税行为的自动识别需求。
In view of the diversified,specialized,universal,and concealed development trend of corporate tax evasion in recent years,in order to better identify whether the company has tax evasion,A random forest model is constructed by the ScikitLearn package through machine learning in the Anaconda integrated development environment and the optimal model is selected by a cross-validation algorithm.Whether the taxpayers in the automobile sales industry are automatically identified for tax evasion.The results show that the automatic identification of tax evasion behavior based on random forest model has higher accuracy than other common classification models(k-nearest neighbor,logistic regression model,decision tree model and Adaboost algorithm),and the classification performance is better so that the automatic identification of tax evasion is successfully conducted.
作者
吴超
罗璟
WU Chao,LUO Jin(Institute of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650050, Chin)
出处
《软件导刊》
2018年第8期13-16,共4页
Software Guide
关键词
随机森林
机器学习
偷漏税行为
分类算法
random forest
machine learning
tax evasion behavior
classification algorithm