摘要
针对集成学习方法在处理大规模数据集时具有计算复杂度高、基分类器数目多、分类精度不理想的问题,提出一种基于频繁模式的选择性集成算法.该算法利用频繁模式挖掘的原理,将未剪枝的集成分类器和样本空间映射为事务数据库,并利用布尔矩阵存储分类结果,然后从中挖掘频繁基分类器组成最终的集成分类器,达到选择性集成的目的.实验结果表明,与集成分类算法Bagging、AdaBoost、WAVE和RFW相比,该算法减小了集成分类器的规模,提高了集成分类器的分类精度和分类效率.
Most ensemble learning methods have high computational complexity, excessive base classifiers and unsatisfactory classification accuracy in case of largescale data sets. This paper proposes an ensemble pruning algorithm based on frequent patterns. Using the theory of frequent patterns mining, the method maps the un-pruned ensemble classifier and corresponding sample space to a transactional database, and stores the corresponding classification results in a boolean matrix. After extracting frequent base classifiers from the Boolean matrix and composing a pruning ensemble, the algorithm gives the final pruning ensemble. Experimental results show that this algorithm reduces the number of base classifiers, improves classification accuracy and increases classification efficiency compared with ensemble algorithms of Bagging, AdaBoost, WAVE and RFW.
出处
《应用科学学报》
CAS
CSCD
北大核心
2013年第6期628-632,共5页
Journal of Applied Sciences
基金
国家自然科学基金(No.61172124)
陕西省教育厅科学研究计划基金(No.12JK0739)
西安市科学计划项目基金(No.CXY1339(5))
西安市碑林区科技计划项目基金(No.GX1308)
西安理工大学特色研究计划项目基金(No.116-211302)资助
关键词
大规模数据集
频繁模式
选择性集成
事务数据库
布尔矩阵
large-scale data set, frequent pattern, ensemble pruning, transactional database, Boolean matrix