摘要
应用数据挖掘技术对知识服务平台上的海量文献进行特征发掘,使用机器学习算法推荐文献,能帮助用户快速获取有效信息。基于逻辑回归的文献推荐系统的排序方法是将文献推荐作为分类问题,采用逻辑回归模型作为排序模型,对文献进行打分推荐。文章认为,可通过分析推荐应用场景和用户日志分布,确立学习目标和采样方案;进而分析文献侧、机构侧、作者侧以及交互特征,进行特征筛选,构建特征数据集合,而后采用逻辑回归模型进行拟合。将线上流量划分为多个同等流量组,并将训练好的模型进行线上对照试验发现,点击率显著增加,说明该方法能够利用文献、用户、上下文等多种特征,通过预测正样本的概率对文献进行个性化排序,效果显著且训练和工程成本低,是一种投入小、见效快的方案。
Applying data mining techniques to extract features from massive literature on knowledge service platforms and using machine learning algorithms to recommend literature can help users quickly obtain effective information.The ranking method of a literature recommendation system based on logistic regression treats literature recommendation as a classification problem,adopting the logistic regression model as the ranking model to score and recommend literature.The article suggests that learning objectives and sampling schemes can be determined by analyzing recommended application scenarios and user log distributions;and then by analyzing features from the literature side,the institution side,the author side,and interaction features,feature selection can be performed to construct a feature data set,which is then fitted with a logistic regression model.Finally,online traffic is divided into multiple equal-traffic groups to conduct online control experiments on the trained model,it was found that the click-through rate increased significantly,indicating that this method can use various features such as literature,users,and context to predict the probability of positive samples for personalized Ranking of literature.It is a solution characterized by low investment and graet effectiveness,with low training and engineering costs.
作者
张良
江程
肖银涛
王现臣
Zhang Liang;Jiang Cheng;Xiao Yintao;Wang Xianchen(Tongfang Knowledge Network Digital Publishing Technology Co.,Ltd.,Beijing 100192)
出处
《中阿科技论坛(中英文)》
2024年第6期87-91,共5页
China-Arab States Science and Technology Forum
关键词
逻辑回归
机器学习
文献推荐系统
个性化排序
点击率
Logistic regression
Machine learning
Literature recommendation system
Personalized sorting
Click-through rate