摘要
支持向量机回归和支持向量机分类有区别,分类问题主要从最大化两类间的间隔入手,而回归问题则需要寻找适合这批数据的自变量和因变量之间关系的回归方程,使得由回归方程计算出来的因变量值和实际数据中的因变量值尽量接近。并且,支持向量机回归和普通的回归问题还不一样,设定了一个2ε间隔带,在这个间隔带内的数据点、不计算损失,之外的计算损失,在尽量最小化损失的同时,模型的目标函数里多了个1/2‖W‖2,关于这个项,有很多疑问,给支持向量机回归目标函数的理解造成了很大的困难,尤其是在学习了支持向量机分类问题后,更容易把这个项和分类问题中同样的项意味着的最大化间隔相联系,但又不能直接对应上。于是,我们从正则化、结构风险最小化(岭回归、权重衰减)、回归超平面的flatten、二分类问题的转化、回归问题的本质这五个方面着手,从不同的角度进行透彻分析和解释1/2‖W‖2,从而为支持向量机回归模型的目标函数的理解进一步理清思路、扫清障碍。而且,第五个理解站在了新的高度,从问题的本质出发,统一了前面四个理解,也具有自己独特的看法。
There are differences between supportvector machine regression and classification.The problem of classificationstarts with maximizing the gap between the two classes,but regression problemneeds to find a regression equation which is close to the real function value.In addition,support vector machine regression and ordinary regression problemsare different;it needs to set up a 2εinterval belt,and the lossesare not calculated for the data in the interval belt,but the losses must becalculated for the data beyond belt.While minimizing losses,there is anotheritem 1/2‖W‖2 in the objective function of the model,and theunderstanding of this item brings difficulties in the teaching process.Especiallyafter studying the SVM classification problem,it is easier to associate thisterm with the maximization interval of the same term in the classificationproblem but it cannot be directly matched.From regularization,structural riskminimization,flatteness of the regression hyperplane,transformation of regressionproblem to binary classification problem,and the nature of regression,we willanalyze and explain in different points,and clear up the obstacles for supportvector regression’s understanding.Moreover,the fifth viewpoint is at a newheight,unifies the first four understandings from the nature of the problem,and it also has its own unique views.
出处
《数据挖掘》
2019年第2期52-59,共8页
Hans Journal of Data Mining