摘要
为解决集成学习算法处理多来源数据集时运算难度大、分类准确率低等问题,提出基于频繁模式树的多来源数据选择性集成算法。比较数据源数据和真值间差异,利用拉依达准则判断多来源数据误差,同时提取多来源数据全部频繁模式,将其转换为压缩形式,创建频繁模式树架构,融合动态选择理念,运用目前数据测试实例从属某个被错误分类数据子集程度,凭借程度多少给予基分类器恰当权重;通过加权调和平均度量调整基分类器间的差异性与准确性,归一化处理频繁项集,最终采用精度高且和其他基分类器差异性大的基分类器组合完成多来源数据选择性集成。仿真结果表明,本文集成学习算法拥有更优的泛化性能和运行效率,分类正确率高。
In order to solve the problems of high computational difficulty and low classification accuracy when ensemble learning algorithm is used to process multi-source data sets,a multi-source data selective integration algorithm based on frequent pattern tree is proposed. By comparing the difference between data source data and truth value,the error of multi-source data is judged by using Raida criterion,and all frequent patterns of multi-source data are extracted and converted into compressed form. The frequent pattern tree structure is created,and the concept of dynamic selection is integrated. The current data is used to test the extent to which an instance belongs to a subset of wrongly classified data by adjusting the difference and accuracy of the base classifiers by weighted harmonic average,the frequent itemsets are normalized,and finally the selective ensemble of multi-source data is completed by combining the base classifiers with high precision and big difference with other base classifiers. Simulation results show that the proposed ensemble learning algorithm has better generalization performance and efficiency,and has high classification accuracy.
作者
方世敏
FANG Shi-min(School of Politics,National Defence University,Shanghai 200433,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2022年第4期885-890,共6页
Journal of Jilin University:Engineering and Technology Edition
基金
国家社科基金军事学项目(16GJ003-179)。
关键词
计算机应用技术
频繁模式树
多来源数据
选择性集成
权重赋权
分类器
computer application technology
frequent pattern tree
multi-source data
selective ensemble
weight weighting
classifier