摘要
堆叠泛化有着与生俱来的高复杂性、“数据泄露”的问题,同时针对不同的数据样本也存在稳定性方面的问题。为此,本文提出的基于敏感哈希的堆叠算法LBDS,利用局部敏感哈希(local sensitive hashing,LSH)算法,首先将训练集和测试集映射到哈希桶,当其中某个桶满时作为开始训练条件,训练出的模型对下一次桶满时的训练数据和测试数据及其邻域进行预测。接着,利用稳定性和信息熵条件对基分类器筛选,生成高层数据。最后,将高层训练预测得到的结果通过混合投票和平均的方法求得最终分类结果。在若干数据集上的验证结果显示,LBDS在Acc和AUC上有平均2%的改进,训练时间复杂度有10%的降低,同时表现出更好的稳定性和更强的泛化能力。
Stack generalization is born with high complexity and data leakage.At the same time,when it faces different data samples,the result is not stable.The LBDS proposed in this paper uses LSH(local sensitive hashing)algorithm to map the training and test set to the hash bucket.When one of the two bucket is full,which will be used as the starting training condition,the trained model predicts the training and test data and their neighborhoods when the other bucket is full.Then the algorithm filters the base classifier by using the stability and information entropy conditions and generates the high-level classifier.Finally,through the mixed voting and average method,the results generated by high-level training prediction are obtained.Experimental results show that LBDS has an average improvement of 2%in ACC and AUC,and a decrease of 10%in training time complexity.Meanwhile,LBDS shows better stability and generalization ability.
作者
王俊杰
温雪岩
徐克生
于鸣
WANG Junjie;WEN Xueyan;XU Kesheng;YU Ming(College of Computer and Engineering, Northeast Forestry University, Harbin Heilongjiang 150040, China;State Forestry Administration Harbin Forestry Machinery Research Institute, Harbin Heilongjiang 150086, China)
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2020年第4期21-31,共11页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家重点研发计划(2016YFD0702105)
中央高校基本科研业务费专项(2572017PZ10)。
关键词
堆叠泛化
局部敏感哈希
时间复杂度
稳定性
元分类器
stack generalization
locally sensitive hashing
time complexity
stability
meta classifier