期刊文献+

面向排序学习的层次聚类特征选择算法 被引量:3

A hierarchical clustering based feature selection algorithm for ranking learning
下载PDF
导出
摘要 大型搜索系统对用户查询的快速响应尤为必要,同时在计算候选文档的特征相关性时,必须遵守严格的后端延迟约束。通过特征选择,提高了机器学习的效率。针对排序学习中快速特征选择的起点多为单一排序效果最好的特征的特点,首先提出了一种用层次聚类法生成特征选择起点的算法,并将该算法应用于已有的2种快速特征选择中。除此之外,还提出了一种充分利用聚类特征的新方法来处理特征选择。在2个标准数据集上的实验表明,该算法既可以在不影响精度的情况下获得较小的特征子集,也可以在中等子集上获得最佳的排序精度。 Large search systems are especially necessary for quick response to user queries.At the same time,strict backend delay constraints must be observed when calculating the feature relevance of candidate documents.Feature selection can improve the machine learning efficiency.Considering the characteristics that most of the initial points of fast feature selection in ranking learning start from the single feature,which has the best ranking effect,this paper first proposes an algorithm of generating initial points of fast feature selection by hierarchical clustering,and applies the algorithm to two existing fast feature selection algorithms.In addition,a new method that makes full use of clustering features is proposed to deal with feature selection.Experiments on two standard datasets show that the proposed algorithm can obtain a smaller feature subset without affecting the accuracy and obtain the best ranking accuracy on a medium subset.
作者 孟昱煜 陈绍立 刘兴长 MENG Yu-yu;CHEN Shao-li;LIU Xing-chang(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
出处 《计算机工程与科学》 CSCD 北大核心 2019年第12期2211-2216,共6页 Computer Engineering & Science
基金 甘肃省自然科学基金(1606RJZA003) 甘肃省住房和城乡建设厅项目(JK2015-15)
关键词 特征选择 排序学习 层次化聚类 贪婪搜索 feature selection ranking learning hierarchical clustering greedy search algorithm
  • 相关文献

参考文献3

二级参考文献21

  • 1Dub K,Kirchhoff K.Learning to rank with partially-labeled data[C] //SIGIR 2008,2008:251-258.
  • 2Robertson S E.Overview of the okapi projects[J].Journal of Documentation,1997,53(1):3-7.
  • 3Crammer K,Singer Y.Pranking with ranking[C] //NIPS 2002,2002.
  • 4Herbrich R,Graepel T,Obermayer K.Large margin rank boundaries for ordinal regression[C] //Advances in Large Margin Classifiers,2000:115-132.
  • 5Joachims T.Optimizing search engines using clickthrough data[C] //KDD 2002,2002:133-142.
  • 6Cao Z,Qin T,Liu T,et al.Learning to rank:from pairwise approach to listwise approach[C] //ICML 2007,2007,227:129-136.
  • 7Zhang M,Kuang D,Hua G C,et al.Is learning to rank effective for Web search[C] //SIGIR 2009 Workshop:Learning to Rank for Information Retrieval,2009.
  • 8Liu T.Learning to rank for information retrieval[J].Foundation and Trends on Information Retrieval.[S.l.] :Now Publishers,2009,3(3):225-331.
  • 9Jolliffe I T.Principal component analysis[M] //2nd ed.Series:Springer Series in Statistics.NY:Springer,2002.
  • 10Blum A,Langley P.Selection of relevant features and examples in machine learning[J].Artificial Intelligence,1997,97:245-271.

共引文献24

同被引文献25

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部