摘要
国家电网省级通信管理系统TMS存在账物不一致、数据录入错误、缺失数据等问题,需要对大量数据进行分析处理并重新分类;为了提高分类学习的准确度,需要对数据的大量特征进行有效选择。本文将随机森林模型应用于特征选择,依据决策树数目、特征划分标准、特征划分候选子集中的最大特征数、特征重排后模型的准确率变化等多个参数,提出了一种优化的TMS系统数据的随机森林特征选择方法,通过实验进行了验证。
TMS has some problems such as inconsistent accounts, wrong data input, missing data, and so on. It needs to analyze and re-classify a lot of data, and to improve the accuracy of classification learning, it needs to select a lot of data features effectively. In this paper, the stochastic forest model is applied to feature selection, according to the number of decision trees, the criteria of feature partition, the maximum feature number in the candidate subset of feature partition, the change of the accuracy of the model after feature rearrangement, etc. , an optimized random forest feature selection method for TMS data is proposed and verified by experiments.
出处
《计算机科学与应用》
2020年第2期276-288,共13页
Computer Science and Application