摘要
排序学习(learning to rank)是一种机器学习与信息检索的交叉学科,可以从大量的包含标记的训练集中自动学习排序模型。特征选取对于排序模型的预测结果有很大的影响,而排序学习对其特征领域的研究却很少。针对这一问题,提出一种特征处理方法:利用基于主成分分析(PCA)的特征重组方法扩展数据集,然后在扩展后的数据集上进行排序算法隐含的特征选择。在LETOR4.0数据集(MQ2007,MQ2008)上基于排序评测函数对List Net排序算法进行验证。通过对比特征处理前后的排序性能差异,以及添加新特征的个数对排序结果的影响,实验结果表明,经过特征处理的利用排序学习算法构建的排序函数一般要优于原始的排序函数。
Learning to rank is an interdisciplinary of machine learning and information retrieval and learns ranking model automaticallyfrom given training data set. The feature space has a great influence on the performance of learning to rank approach,however,there area little research in terms of feature generation. For this,we propose one feature analysis method which extends data set by feature recom-bination based on PCA,and then performs feature selection implied by learning to rank methods on the extended data set. We evaluateranking algorithm ListNet on the LETOR4. 0 (MQ2007,MQ2008) data set based on ranking evaluation index,and experimentally com-pare the performance of ListNet using the data set with new feature vectors and not,as well as the impact of the number of the new fea-tures added to the result of sort. The experiment shows that ranking functions learned through learning to rank method based on the fea-ture analysis methods outperform the original ones.
作者
李伟宁
王磊
LI Wei-ning;WANG Lei(School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Electronic Science and Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《计算机技术与发展》
2018年第9期30-33,37,共5页
Computer Technology and Development
基金
国家"863"高技术发展计划项目(2006AA01Z201)