摘要
针对自制电影数据集中电影的排序问题,文章提出了一种基于RF的Bootstrap自适应双集成排序学习方法(RandomForest-based Bootstrap Self-adaptive Double-ensemble,RF-based BSD)。先利用电影媒体网站数据构建21个特征自建基于排序学习格式的电影数据集,BSD会根据输入数据集的查询数、查询-电影对数和特征数,通过Bootstrap自适应函数自动确定RF的子采样比例,然后使用单集成模型(比如MART,Multiple Additive Regression Tree,多重累计回归树)作为基学习器进行训练,最后采用bagging思想输出最终的双集成模型。实验结果显示,对比两个评价指标NDCG(Normalized Discounted Cumulative Gain,归一化折扣累计增益)和MAP(Mean Average Precision,平均值均值)的评估效果,发现BSD输出的双集成模型比单集成模型在两项指标上均有1%-3%左右的提升。
Aiming at the ranking problem of films in the self-built movie datasets,this paper proposes an RandomForestbased Bootstrap Self-adaptive Double-ensemble learning-to-rank method(RF-based BSD).First,movie datasets based on the learning-to-rank format has been self-built by using the movie media website data to construct 21 features.The sub-sampling ratio of RF will be automatically determined by BSD according to the number of queries,the query-movie pairs and the feature number in the datasets through the Bootstrap self-adaptive formula.Then the single integration model(e.g.MART,Multiple Additive Regression Tree)has been used as the base learner to train.Finally,the bagging idea is adopted to output the final double ensembling model.The experimental results show that by comparing the assessment effects of two evaluation indicators NDCG(Normalized Discounted Cumulative Gain)and MAP(Mean Average Precision).It is found that the double-ensemble model output by BSD has an improvement of about 1%-3%in both indicators compared with the single-ensemble model.
作者
何启泓
李旭军
孙燕
HE Qi-hong;LI Xu-jun;SUN Yan(School of Physics and Optoelectronic Engineering,Xiangtan University,Xiangtan 411105,China)
出处
《电脑与信息技术》
2021年第5期1-6,共6页
Computer and Information Technology