期刊文献+

基于Hive数据仓库的在线阅读用户建模与聚类方法

User Model and Clustering of Online reading System Based on Hive Warehouse
下载PDF
导出
摘要 移动互联网的高速发展为在线阅读系统带来了海量的用户行为日志。针对日益巨大的TB甚至PB级用户行为日志数据,该文设计一种基于Hive数据仓库的用户模型及用户聚类方案。该方法能够准确的基于用户的阅读行为刻画用户的多维度、多尺度偏好特征,构建动态用户需求模型,并基于用户特征进行聚类,划分用户集,为个性化推荐、搜索或者广告投放等Web个性化应用提供服务。实验结果表明,该方法可以发挥集群存储和运算的优势,具有良好的性能和执行速度。 The rapid development of mobile Internet brought huge user behavior logs to online reading system.In the face of increasingly large terabytes even petabytes user log data,we design a user model and user clustering solution based on hive warehouse.This method can accurately depict the various dimensions and scales of user preferences,building dynamic user requirement model based on user reading behavior and apply clustering algorihtm to divide user into groups to provide service for personalized Web application,such as recommendation,search,advertisement delivery.Test results show that this method can take advantage of the storage and computing power of hadoop cluster,thus has a good performance and speed of execution.
出处 《电脑知识与技术(过刊)》 2015年第11X期45-48,共4页 Computer Knowledge and Technology
基金 科技部科技支撑项目(2012BAH95F03)资助
关键词 Hive 数据仓库 在线阅读 用户模型 用户聚类 Hive data warehouse online reading user model user clustering
  • 相关文献

参考文献10

  • 1(美) 怀特 (White,T.),著.Hadoop权威指南(M)清华大学出版社, 2010
  • 2ASHISH T,JOYDEEP S,NAMIT J et al.Hive-A Petabyte Scale Data Warehouse Using Hadoop. Data Engineering (ICDE),2010 IEEE 26th International .
  • 3Apache Hive Architecture. https://cwiki.apache.org/confluence/display/Hive/Design .
  • 4Wilson score interval. https://en.wikipedia.org/wiki/Bi-nomial_proportion_confidence_interval .
  • 5Hive Windowing and Analytics Functions. https://cwi-ki.apache.org/confluence/display/Hive/Language Manual+Win-dowing And Analyticscs .
  • 6Shvachko K,Kuang H.The Hadoop Distributed File System. Mass Storage Systems and Technologies (MSST) . 2010
  • 7程苗,陈华平.基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39. 被引量:64
  • 8Michael Pazzani,Daniel Billsus.Learning and Revising User Profiles: The Identification of Interesting Web Sites[J]. Machine Learning . 1997 (3)
  • 9Maurice D. Mulvenna,Sarabjot S. Anand,Alex G. Büchner.??Personalization on the Net using Web mining: introduction(J)Communications of the ACM . 2000 (8)
  • 10Jeffrey Dean,Sanjay Ghemawat.MapReduce[J]. Communications of the ACM . 2008 (1)

二级参考文献5

共引文献77

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部