摘要
针对传统机器学习方法在完成分类任务时多数存在人工标记成本较高、泛化能力较弱的问题,提出一种标记组合半监督学习算法。基于集成学习的思想,利用有标记数据训练多个弱模型并进行组合,增强模型的泛化能力。对无标记数据进行预测,生成有噪声的标记并组合建模。在风险最小化的框架下,使模型收敛达到最优。实验结果表明,在2种有监督场景下与现有的支持向量机、分类与回归树、神经网络等算法相比,该算法具有较优的泛化能力。
Traditional machine learning method always needs high cost manual marking process,and exhibits weak ability of generalization in classification task.In order to solve these problems,a label combination semi-supervised learning algorithm is proposed.Taking advantage of the principle of ensemble learning,the algorithm uses the labeled data to train multiple weak learners,and combine them to enhance the generalization ability.Predict the unlabeled data to generate noise labels,and then combine and model these noise labels to make the model more robust.Under the framework of risk minimization,the model converges to the optimal state.Experimental results show that,compared with some existing learning algorithms like Support Vector Machine(SVM),Classification and Regression Tree(CART),Neural Network(NN),the algorithm has relatively good generalization ability.
作者
林金钏
艾浩军
LIN Jinchuan;AI Haojun(School of Computer Science,Wuhan University,Wuhan 430072,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第4期157-162,168,共7页
Computer Engineering
基金
国家重点研发计划(2016YFB0502201)
关键词
半监督学习
集成学习
风险最小化
梯度下降
损失函数
semi-supervised learning
ensemble learning
risk minimization
gradient descent
loss function