摘要
针对分层并行SVM算法采用完全随机划分方法生成的子样本集与原始样本集的分布情况存在偏差的问题,提出分布式k-means聚簇的导向随机划分方法。该方法并非将上一层的训练结果直接作为下一层的输入,而是使用k-means聚簇算法聚成下一层节点数N的不同簇,然后把每一簇样本再随机划分成N份,从每一簇中随机取出一份重新组合成下一层训练的N个子样本集,进而保证子样本集与原始样本集的分布情况具有相似性。结果表明,该方法既能有效提高学习能力,又能减少多次训练模型的抖动。
Aimed at the problems that the layered parallel SVM algorithm is to generate sub-sample sets by completely adopting the random partition method,and the distribution deviation exists in between the subsample sets and the original sample set,this paper proposes a random-oriented partition method based on distributed k-means clustering.Not that the method is used to take a layer of the training results directly as input of the next layer,but that the k-means clustering algorithm is used to cluster into the number of the next layer node clusters.Then,the paper divides each cluster samples into N parts randomly,and takes out one from each cluster reassembled into Nsub-sample sets to next layer of training to ensure the distribution of the sub-sample sets similar to original sample set.The results show that this method can not only improve learning ability effectively,but also reduce the jitter of training model.
作者
王瑞
向新
肖冰松
WANG Rui;XIANG Xin;XIAO Bingsong(Aeronautics Engineering College, Air Force Engineering University, Xi'an 710038, China)
出处
《空军工程大学学报(自然科学版)》
CSCD
北大核心
2018年第2期86-92,共7页
Journal of Air Force Engineering University(Natural Science Edition)
基金
陕西省自然科学基础研究计划(DF011000306)