摘要
移动互联网的高速发展为在线阅读系统带来了海量的用户行为日志。针对日益巨大的TB甚至PB级用户行为日志数据,该文设计一种基于Hive数据仓库的用户模型及用户聚类方案。该方法能够准确的基于用户的阅读行为刻画用户的多维度、多尺度偏好特征,构建动态用户需求模型,并基于用户特征进行聚类,划分用户集,为个性化推荐、搜索或者广告投放等Web个性化应用提供服务。实验结果表明,该方法可以发挥集群存储和运算的优势,具有良好的性能和执行速度。
The rapid development of mobile Internet brought huge user behavior logs to online reading system.In the face of increasingly large terabytes even petabytes user log data,we design a user model and user clustering solution based on hive warehouse.This method can accurately depict the various dimensions and scales of user preferences,building dynamic user requirement model based on user reading behavior and apply clustering algorihtm to divide user into groups to provide service for personalized Web application,such as recommendation,search,advertisement delivery.Test results show that this method can take advantage of the storage and computing power of hadoop cluster,thus has a good performance and speed of execution.
出处
《电脑知识与技术(过刊)》
2015年第11X期45-48,共4页
Computer Knowledge and Technology
基金
科技部科技支撑项目(2012BAH95F03)资助