期刊文献+

基于云计算的用户浏览偏爱路径挖掘算法 被引量:6

Algorithm of discovering preferred browsing paths based on cloud-computing
下载PDF
导出
摘要 从Web日志中挖掘用户浏览偏爱路径是一个重要的研究课题。目前的挖掘算法注重客观访问频度,忽略了用户对这一频繁访问路径是否感兴趣。在分析目前用户偏爱路径挖掘算法存在的问题的基础上,结合网站拓扑结构图修正基于频度的用户偏爱路径的衡量标准,提出了有用偏爱度的概念,从而剔除由于页面放置和链接等因素对挖掘的影响;针对目前基于单一节点的挖掘系统的计算能力不足的问题,利用云计算的分布式处理和虚拟化技术的优势,给出了一种基于云计算的数据处理方法,在此基础上挖掘用户浏览偏爱路径。实验表明,该算法针对大数据量的日志进行挖掘,准确率和效率比普通基于频度进行用户浏览偏爱路径挖掘的算法有所提高。 Mining user preferred browsing paths from Web logs is an important research topic.The current mining algorithms are focused on users' browsing frequency,neglecting an important problem of whether users are interested in the frequent path or not.Based on the analysis of the present algorithms for mining user browsing patterns, Web topology structure is combined to revise the measures of users' preferred browsing paths which are based on browsing frequency, and a concept of useful preference is presented.The bad impact of mining is removed due to pages' place and links;meanwhile, due to the problem that current mining system's computational capacity on single node is not enough,by the advantage of cloud computing's distributed processing and virtual technology,it presents a method of data processing based on cloud computing to mining users' preferred browsing paths.The result shows, this algorithm is better than one which is based on frequency when mining a number of Web logs in accuracy and efficiency.
作者 程苗
出处 《计算机工程与应用》 CSCD 北大核心 2011年第29期85-89,共5页 Computer Engineering and Applications
基金 博士点基金项目(No.200803580024) 创新研究群体科学基金(No.70821001)
关键词 浏览偏爱路径 云计算 WEB使用挖掘 WEB 日志 preferred browsing paths cloud computing Web usage mining Web log
  • 相关文献

参考文献10

  • 1韩家炜,孟小峰,王静,李盛恩.Web挖掘研究[J].计算机研究与发展,2001,38(4):405-414. 被引量:356
  • 2邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径[J].计算机学报,2003,26(11):1518-1523. 被引量:87
  • 3李颖基,彭宏,郑启伦,曾炜.Web日志中有趣关联规则的发现[J].计算机研究与发展,2003,40(3):435-439. 被引量:20
  • 4Dean J, Ghemawat S.MapReduce: simplified data processing on large clusters[C]//OSDI' 04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, 2004.
  • 5郭本俊,王鹏,陈高云,黄健.基于MPI的云计算模型[J].计算机工程,2009,35(24):84-86. 被引量:38
  • 6王鹏.云计算的关键技术与应用实例[M].北京:人民邮电出版社,2009.
  • 7Savasere A, Omiecinski E,Navathe S.An efficient algorithm for mining association rules in large databases[C]//Proceedings of the 21 st VLDB Conference, Zurich, Switzerland, 1995.
  • 8Srivastava J, Cooley R, Deshpande M,et al.Web usage mining: discovery and applications of usage patterns from Web data[J]. SIGKDD Explorations,2000,1(2).
  • 9李健,徐超,谭守标.一种Web数据挖掘系统的设计和研究[J].计算机技术与发展,2009,19(2):70-73. 被引量:7
  • 10万至臻.基于MapReduce模型的并行计算平台的设计与实现[D].杭州:浙江大学,2008.

二级参考文献29

  • 1周琪锋.基于Web的数据挖掘技术的研究[J].电脑知识与技术,2007(1):97-97. 被引量:4
  • 2Thomas C. Google and IBM Partner to Push Cloud Computing[Z]. (2007-08-08). http://www.informationweek.com/news/intemet/show Article.j html?articleID=202400042.
  • 3Stephen B. Google and the Wisdom of Clouds[Z]. (2007-12-13). http://www.businessweek.com/magazine/content/07_52/b40640489 25836.htm.
  • 4Dean J. MapReduce: Simplified Data Processing on Large Clusters[C]//Proc. of the 6th IEEE Symposium on Operating System Design and Implementation. San Francisco, CA, USA: [s. n.], 2004.
  • 5BarryW.Parallel Programming[M].陆鑫达,译.2版.北京:机械工业出版社,2005.
  • 6Han J,Data Mining:Concepts and Techniques,2000年
  • 7Wang K,Proc of VLDB'97,1999年,363页
  • 8Zaiane O R,Proc Int Workshop Web Information and Data Management(WIDM'98),1998年,9页
  • 9Mobasher B,Tech Rep:TR96 0 5 0,1996年
  • 10Zaiane O R,Proc KDD'95,1995年,331页

共引文献508

同被引文献53

  • 1张桂刚,李超,张勇,邢春晓.云环境下海量数据资源管理框架[J].系统工程理论与实践,2011,31(S2):28-32. 被引量:6
  • 2周学权,战德臣,聂兰顺,孟凡超.面向多租户的多层次可伸缩SaaS软件架构研究[J].华中科技大学学报(自然科学版),2013,41(S2):131-136. 被引量:5
  • 3何丽,韩文秀.一种基于后缀树的Web访问模式挖掘算法[J].计算机应用,2004,24(11):68-70. 被引量:6
  • 42011 Digital universe study [EB/OL]. http: //www. emc. com/collateral/analyst-reports/idc-extracting-value-from-chaosar. pdf.
  • 5JOY K I. Massive data visualization : a survey [ C ]//MOELLER T, HAMANN B, RUSSELEDS R D. Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration. Heidelberg : Springer Verlag,2009:285-302.
  • 6罗爱宝,陈光鹏,商琳.海量数据处理[J].中国人工智能学会通讯,2011(2).
  • 7李德毅.第二届中国云计算大会[EB/OL].[2010-06-30]. http: //www. ciecloud, org/2010.
  • 8CANNATARO M, TALIA D, TRUNFIO P. Knowledge (GRID) : high performance knowledge discovery service on the grid [ C ]. Second Grid International Workshop, 2001: 38-50.
  • 9TALIA D, TRUNFIL P. How distributed data mining tasks can thrive as knowledge services [ J ]. Communications of The ACM, 2010, 53 (7): 132-137.
  • 10HUANG J. Speech: massive data mining and information service[ EB/OL]. [2010-12-28 ]. http://www, rmbi. ust. hk/docs/ Speech% 20by% 20Dr% 20Joshua% 20Huang. pdf.

引证文献6

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部