摘要
损失函数度量回归分析中,信息损失和错误的程度是机器学习算法最小化的目标函数。本文研究在有限数据集上线性回归分析的损失函数选取方法。对于给定的噪声密度,存在一个满足一致性条件的最优损失函数(如噪声密度满足高斯分布,则常见的最优损失函数是平方损失函数)。但在实际应用中,噪声密度往往是不确定的,且训练样本集有限。一些统计信息可用来对有限信息环境下的损失函数进行选取,但这些统计信息是基于一些一致性假设且在有限的样本集上不一定有效。针对这些问题,借鉴Vapnik的ε-insensitive损失函数,提出一种启发式的基于样本数目及噪声方差的参数设置方法。实验结果表明,与常用的平方损失函数及Huber的least-modulus loss相比,本文的损失函数性能更健壮且预测效率更准确。
Loss function is used to quantify information loss and false degree in regression analysis. This paper addresses heuristic loss function selection for linear regression. For a given noise density, there exists an optimal loss function under an asymptotic setting i.e. squared loss is optimal for Gaussian noise density. However, in real-life applications the noise density is always un- known and the training samples are finite. Robust statistics provides ways for selecting the loss function using statistical informa- tion about noise density, however robust statistics is based on asymptotic assumption and may not be well applied for finite sample data sets. For such practical problems, we try to utilize concept of Vapnik' s ε-insensitive loss function. We propose a heuristic method for setting the value of ε as a function of samples and noise variance. Experimental comparisons for linear regression prob- lems show that the proposed loss function performs more robustly performance and yields higher prediction accuracy compared with popular squared loss and Huber' s least-modulus loss.
出处
《计算机与现代化》
2017年第8期1-4,共4页
Computer and Modernization
关键词
损失函数
支持向量机
平方损失函数
参数选择
VC维
loss function
support vector machine
square loss function
parameter selection
VC dimension