摘要
提出了一种基于随机森林的提取方法.通过明确财务数据具有的信息增益、信息储存量、搜索引擎等多种属性,建立决策树,利用基尼系数计算决策树中每个节点数据的类别值,将同一类别数据归类到同一层次内,若得到的数据特征分布较为混乱,采用分割法将所处数据集内的全部数据实施分割,直至迭代得到最为精准的结果.在决策树中引入选择性集成算法,根据得到的特征值,将具有同种特征的数据划分到同一子范畴内,保证特征统一性,在决策树中输入异常数据特征,通过特征查找提取到异常财务数据值.仿真实验证明,所提方法对异常数据的提取精准度高,误检率低,可以最少的迭代次数实现达到较好的结果.
Due to the poor data stability and many kinds of financial abnormal data extraction methods,it was difficult to extract abnormal data.Therefore,proposes an extraction method based on random forest.By clarifying the information gain,information storage,search engine and other attributes of financial data,a decision tree was established.The Gini coefficient was used to calculate the category value of each node data in the decision tree,and the same category data was classified into the same level.If the data feature distribution was chaotic,the segmentation method was used to segment all the data in the data set until the most accurate result was obtained through iteration.The selective integration algorithm was introduced into the decision tree.According to the obtained eigenvalues,the data with the same characteristics are divided into the same subcategory to ensure the unity of characteristics.The abnormal data characteristics are input into the decision tree,and the abnormal financial data values are extracted through feature search.Simulation results show that the proposed method has high extraction accuracy and low false detection rate,and can achieve the desired results with the least number of iterations.
作者
叶正娟
YE Zheng-juan(Department of Economic Management,Hefei Science and Vocational Technology College,Hefei Anhui 230000,China)
出处
《淮阴师范学院学报(自然科学版)》
CAS
2024年第1期13-19,共7页
Journal of Huaiyin Teachers College;Natural Science Edition
基金
安徽省高等学校人文社会科学研究重点项目“大数据环境下的财务管理创新研究”(SK2018A1030)。
关键词
随机森林决策树
基尼系数
信息储存量
子范畴
召回率
random forest decision tree
Gini coefficient
information storage capacity
subcategory
recall