期刊文献+

基于正则化KL距离的交叉验证折数K的选择 被引量:4

A Selection Criterion of Fold K in Cross-validation Based on Regularized KL Distance
下载PDF
导出
摘要 在机器学习中,K折交叉验证方法常常通过把数据分成多个训练集和测试集来进行模型评估与选择,然而其折数K的选择一直是一个公开的问题。注意到上述交叉验证数据划分的一个前提假定是训练集和测试集的分布一致,但是实际数据划分中,往往不是这样。因此,可以通过度量训练集和测试集的分布一致性来进行K折交叉验证折数K的选择。直观地,KL(Kullback-Leibler)距离是一种合适的度量方法,因为它度量了两个分布之间的差异。然而直接基于KL距离进行K的选择时,从多个数据实验结果发现随着K的增加KL距离也在增大,显然这是不合适的。为此,提出了一种基于正则化KL距离的K折交叉验证折数K的选择准则,通过最小化此正则KL距离来选择合适的折数K。进一步多个真实数据实验验证了提出准则的有效性和合理性。 In machine learning,the K-fold cross-validation method often divides the data into multiple training and test sets for model evaluation and selection.However,the selection of the fold K is always an open problem.Note that one of the premises of the above cross-validation data division assumes that the training set and the test set have the same distribution,but in actual data division,this is often not the case.Therefore,the selection of the fold K can be performed by measuring the distribution consistency of the training set and the test set in K-fold cross-validation.Intuitively,KL(Kullback-Leibler)distance is a suitable measure because it measures the difference between two distributions.However,when selecting K directly based on the KL distance,it is found from multiple data experimental results that the KL distance also increases with the increase of K,which is obviously inappropriate.To this end,a selection criterion of the fold K in K-fold cross-validation based on regularized KL distance is proposed,and the appropriate fold K is selected by minimizing this regular KL distance.Multiple real data experiments in a recent step have verified the effectiveness and rationality of the proposed criterion.
作者 褚荣燕 王钰 杨杏丽 李济洪 CHU Rong-yan;WANG Yu;YANG Xing-li;LI Ji-hong(School of Mathematical Sciences,Shanxi University,Taiyuan 030006,China;School of Modern Educational Technology,Shanxi University,Taiyuan 030006,China;School of Software,Shanxi University,Taiyuan 030006,China)
出处 《计算机技术与发展》 2021年第3期52-57,共6页 Computer Technology and Development
基金 山西省应用基础项目研究计划(201901D111034,201801D211002) 国家自然科学基金资助项目(61806115)。
关键词 K折交叉验证 折数K的选择 KL(Kullback-Leibler)距离 正则化 机器学习 K-fold cross-validation selection of the fold K KL distance(Kullback-Leibler distance) regularized machine learning
  • 相关文献

参考文献4

二级参考文献57

  • 1王兴玲,李占斌.基于网格搜索的支持向量机核函数参数的确定[J].中国海洋大学学报(自然科学版),2005,35(5):859-862. 被引量:123
  • 2VapnikVN.统计学习理论的本质[M].北京:清华大学出版社,2000..
  • 3HASTIE T,TIBSHRANI R,FRIEDMAN J. The elements of statistical learning:data mining,inference,and prediction[M].New York:Springer,2001.
  • 4WASSEMAN L. Bayesian model selection and model averaging[J].Journal of Mathematical Psychology,2000,44(1):92-107.
  • 5SPIEGELHALTER D J,BES N G,CARLIN B P,et al. Bayesian measures of model complexity and fit[J].Statistical Methodology,2002,64(4):583-639.
  • 6KADANE J B,LAZAR N A. Methods and criteria for model selection[J].Journal of the American Statistical Association,2004,99(465):279-290.
  • 7FRONMONT M. Model selection by bootstrap penalization for classification[J].Machine Learning,2007,66(2-3):165-207.
  • 8BENGIO Y,GRANDVALET Y. No unbiased estimator of variance of K-fold cross validation[J].Machine Learning,2004,5:1089-1105.
  • 9GRANDVALET Y,BENGIO Y. Hypothesis testing for cross validation[D].Montreal:University of Montreal,2006.
  • 10ARLOT S,CELISSE A. A survey of cross-validation procedures for model selection[J].Statistics Surveys,2010,4:40-79.

共引文献75

同被引文献22

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部