期刊文献+

基于RF的排序学习方法在电影数据集中的应用 被引量:1

Application of RF-based Learning-to-rank Method in Movie Dataset
下载PDF
导出
摘要 针对自制电影数据集中电影的排序问题,文章提出了一种基于RF的Bootstrap自适应双集成排序学习方法(RandomForest-based Bootstrap Self-adaptive Double-ensemble,RF-based BSD)。先利用电影媒体网站数据构建21个特征自建基于排序学习格式的电影数据集,BSD会根据输入数据集的查询数、查询-电影对数和特征数,通过Bootstrap自适应函数自动确定RF的子采样比例,然后使用单集成模型(比如MART,Multiple Additive Regression Tree,多重累计回归树)作为基学习器进行训练,最后采用bagging思想输出最终的双集成模型。实验结果显示,对比两个评价指标NDCG(Normalized Discounted Cumulative Gain,归一化折扣累计增益)和MAP(Mean Average Precision,平均值均值)的评估效果,发现BSD输出的双集成模型比单集成模型在两项指标上均有1%-3%左右的提升。 Aiming at the ranking problem of films in the self-built movie datasets,this paper proposes an RandomForestbased Bootstrap Self-adaptive Double-ensemble learning-to-rank method(RF-based BSD).First,movie datasets based on the learning-to-rank format has been self-built by using the movie media website data to construct 21 features.The sub-sampling ratio of RF will be automatically determined by BSD according to the number of queries,the query-movie pairs and the feature number in the datasets through the Bootstrap self-adaptive formula.Then the single integration model(e.g.MART,Multiple Additive Regression Tree)has been used as the base learner to train.Finally,the bagging idea is adopted to output the final double ensembling model.The experimental results show that by comparing the assessment effects of two evaluation indicators NDCG(Normalized Discounted Cumulative Gain)and MAP(Mean Average Precision).It is found that the double-ensemble model output by BSD has an improvement of about 1%-3%in both indicators compared with the single-ensemble model.
作者 何启泓 李旭军 孙燕 HE Qi-hong;LI Xu-jun;SUN Yan(School of Physics and Optoelectronic Engineering,Xiangtan University,Xiangtan 411105,China)
出处 《电脑与信息技术》 2021年第5期1-6,共6页 Computer and Information Technology
关键词 随机森林 排序学习 电影数据集 Bootstrap子采样 双集成模型 RandomForest learning-to-rank movie datasets Bootstrap sub-sampling double-ensemble
  • 相关文献

参考文献6

二级参考文献38

  • 1Zhi-HuaZhou YangYu.Adapt Bagging to Nearest Neighbor Classifiers[J].Journal of Computer Science & Technology,2005,20(1):48-54. 被引量:7
  • 2曹红兵.搜索引擎的个性化检索研究[J].图书情报工作,2007,51(3):129-132. 被引量:16
  • 3Breiman L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
  • 4Ho T.The random subspace method for constructing decision forests[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844.
  • 5Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32.
  • 6Zhang H,Wang M.Search for the smallest random forest[J].Statistics and ITS Interface,2009,2(3).
  • 7Díaz-Uriarte R,De Andres S A.Gene selection and classification of microarray data using random forest[J].BMC Bioinformatics,2006,7(1).
  • 8Svetnik V,Liaw A,Tong C,et al.Random forest:a classification and regression tool for compound classification and QSAR modeling[J].Journal of Chemical Information and Computer Sciences,2003,43(6):1947-1958.
  • 9Oshiro T M,Perez P S,Baranauskas J A.How many trees in a random forest[M]//Machine learning and data mining in pattern recognition.Berlin Heidelberg:Springer,2012:154-168.
  • 10Kulkarni V Y,Sinha P K.Pruning of random forest classifiers:a survey and future directions[C]//2012 International Conference on Data Science&Engineering(ICDSE),2012:64-68.

共引文献94

同被引文献6

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部