摘要
数据归一化是训练支持向量机(SVM)必须的数据预处理过程.常用的归一化方法有[-1,+1]、N(0,1)等方法,但现有文献尚未发现关于这些常用归一化方法科学依据方面的研究.本文以经验性的实验对数据归一化的理由、归一化与不归一化对训练效率和模型预测能力影响等方面开展研究.论文选择标准数据集,对原始未归一化、不同方法归一化、人工逆归一化、任选数据属性列等情况下的数据分别进行了SVM训练,并记录目标函数值随迭代次数的变化、训练时间、模型测试及k-CV性能等信息.实验结果表明,将数据值限制在常规范围内的归一化方法,如[-0.5,+0.5]-[-5,+5]、N(0,1)-N(0,5)等均能在训练时间最短的情况下获得最佳的预测模型.本文工作为SVM以及一般机器学习算法的数据归一化提供了科学依据.
Data normalization is a necessary training support vector machine (SVM) to the process of data preprocessing. The normalization method commonly used contains [-1, + 1 ], N (0,1), etc. However, the existing literature has not yet been found on the research of these commonly used normalization methods of scientific basis. This paper carries out research based on empirical experiments on data normalization, training efficiency and model prediction effect of normalization and non-normalization, etc. Standard data set being selected, this paper analyzed the original non-normalized data, data normalized by different method, artificial inverse normalization and optional attribute of the data by SVM training, recorded changes of objective function values with the number of iterations, training time, model test and k- CV performance information, etc. The experimental results show that the normalization method of limiting the data in the conventional range, such as [-0.5, +0.5] to [-5, +5], N (0, 1) - N (0,5) can obtain the best predictive model in the ease of short training time. This paper provides a scientific basis for the normalization of SVM data and learning algorithm of general machine.
出处
《山东师范大学学报(自然科学版)》
CAS
2016年第4期60-65,共6页
Journal of Shandong Normal University(Natural Science)