Apriori算法在发现用户网页浏览模式上的应用

Application of Apriori Algorithm in Finding User’s Webpage Browsing Mode

下载PDF

导出

摘要 web服务器的日志文件记录了大量的用户网页访问信息,如何分析这些数据并从中发现用户的网页浏览模式比如用户感兴趣的页面、最佳的页面组合等从而为商家提供良好的决策支持变得越来越重要。本文用数据挖掘技术中的Apriori算法对记录用户页面访问信息的日志数据进行挖掘从而得到用户浏览网页的模式。本文首先对日志数据进行了预处理,从中提取了用户的一次会话中的页面访问记录,然后用Apriori算法对这些访问记录数据进行挖掘,同时针对这些待挖掘数据上的特点对挖掘算法Apriori在k-项候选集与事务的匹配上进行了改进,实验结果表明改进后的算法在处理数据量很大的数据时性能较传统算法有很好的提高。最后本文对挖掘后产生的规则进行了分析,发现了用户对本网站的一些网页的浏览模式,这些浏览模式为商家提供良好的决策支持。 The log file of web server which recorded a large number of user’s visiting webpage information, and how to analyze these data and discover the user’s webpage browsing mode such as the webpages which users’ interested in browsing and the best page composition so as to provide a good decision support for merchants has become increasingly important. In this paper, Apriori algorithm was used to mine the log data of recording use’s accessing information for finding the regular pattern of user’s browsing the webpage. Firstly, this paper made data preprocessing to the log data for extracting one session access record of user. Secondly, the Apriori algorithm was used to mine these record data, considering the feature of these data, the paper made litter improvement for the algorithm at the matching of k-candidate set and the transaction. The experimental results showed that the performance of the improved algorithm in handling a large amount of data has a good improvement. Finally, this paper analysed the rules by excavating, and through these rules, some browsing modes were found, which provided decision supports for merchants.

作者魏林刘建毅王枞

机构地区北京邮电大学计算机学院北京邮电大学软件学院

出处《软件工程与应用》 2013年第6期125-130,共6页 Software Engineering and Applications

基金市委、市政府资助.

关键词 WEB日志 APRIORI算法 WEB日志挖掘会话识别 k-项候选集

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1许晓东,李柯,朱士瑞.Web使用挖掘中Apriori算法的改进研究[J].计算机工程与设计,2010,31(3):539-541. 被引量：6
2周爱武,程博,李孙长,夏松.Web日志挖掘中的会话识别方法[J].计算机工程与设计,2010,31(5):936-938. 被引量：13
3李燕,冯博琴,鲁晓锋.Web日志挖掘中的数据预处理技术[J].计算机工程,2009,35(22):44-46. 被引量：22
4季成,李晓东,袁坚,尉迟学彪,山秀明.基于k-means算法的DNS查询模式分析[J].清华大学学报（自然科学版）,2010,50(4):601-604. 被引量：5
5朱靖君,吴海燕,高国柱,程志锐.一种基于日志分析的Web负载测试方法[J].计算机工程,2010,36(23):25-27. 被引量：8
6朱扬勇,周欣,施伯乐.规则型数据采掘工具集AMINER[J].高技术通讯,2000,10(3):19-22. 被引量：27

二级参考文献34

1胡吉明,鲜学丰.挖掘关联规则中Apriori算法的研究与改进[J].计算机技术与发展,2006,16(4):99-101. 被引量：59
2陈嶷瑛,武强,李文斌.基于事务树操作的关联规则挖掘算法[J].计算机工程,2006,32(14):40-42. 被引量：4
3Pal S K, Talwar V, Mitra R Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Direction[J]. IEEE Trans. on Neural Networks, 2002, 13(5): 1163-1164.
4Elo-Dean S, Viveros M. Data Mining the IBM Official 1996 Olympics Web Site[Z]. IBM T.J. Watson Research Center, 1997.
5Shahabi C, Zarkesh A, Adibi J, et al. Knowledge Discovery from Users Web-page Navigation[C]//Proc. of Workshop on Research Issues in Data Engineering. Birmingham, England: [s. n.], 1997.
6Cooley R, Mobasher B, Srivastava J. Data Preparation for Mining World Wide Web Browsing Patterns[J]. Knowledge and Information Systems, 1999, 1(1): 5-6.
7Spiliopoulou M, Mobasher B, Berendt B, et al. A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis[J], INFORMS Journal on Computing, 2003, 15(2): 171-172.
8Sheng Chai, Jia Yang, Yang Cheng. The research of improved Apriori algorithm for mining association rules [C]. Chengdu, China:International Conference on Service Systems and Service Management, 2007.
9Mikroyannidis A,Theodoulidis B.A theoretical framework and an implementation architecture for self adaptive web sites[C]. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence.Beijing:IEEE Press,2004.
10Facca F M,Lanzi P L.Mining interesting knowledge from weblogs: a survey[J].Data and Knowledge Engineering,2005,53(3): 225-241.

共引文献70

1刘韧,张喆.分布式数据挖掘在客户信用评估中的应用[J].微型电脑应用,2004,20(9):3-4. 被引量：1
2冼进,甄艳玲.数据挖掘浅析[J].金卡工程,2004,8(12):46-48. 被引量：1
3陈莉,罗学院.关联规则分布式算法的性能评价[J].铁路计算机应用,2005,14(2):14-17.
4文娟,薛永生,段江娇,王劲波.基于关联规则的日志分析系统的设计与实现[J].厦门大学学报（自然科学版）,2005,44(B06):258-261. 被引量：13
5张德丰.基于SLIQ的数据挖掘分类算法[J].计算机与现代化,2005(7):24-27. 被引量：1
6赵亚梅,杨建国,李蓓智.基于神经网络预测模型输入参数配置方法的实现[J].计算机测量与控制,2005,13(9):929-931. 被引量：4
7闫禹.多维频繁项集计算方法及应用[J].沈阳师范大学学报（自然科学版）,2005,23(4):368-371.
8彭茹.关联规则的数据挖掘系统结构及模型的研究[J].北京城市学院学报,2005(4):82-86. 被引量：1
9伍义涛.选准“婆家”好“嫁女”[J].军事记者,2006(1):41-41.
10王兆红.基于信息熵的决策树[J].潍坊学院学报,2006,6(4):28-29. 被引量：1

1肖宏飞.基于web挖掘的网站信息推送个性化服务研究——以“网页动画设计”课程网站的信息推送为案例[J].通化师范学院学报,2018,39(8):59-63. 被引量：2
2班蕊,丁丹丹,张明敏,沈华清.基于体感的在线互动教育游戏设计与实现[J].系统仿真学报,2017,29(11):2890-2897. 被引量：8
3曾令,肖如良.基于相邻请求的动态时间阈值会话识别算法[J].计算机应用,2017,37(11):3335-3338.
4郝继刚.挑选最好的浏览器(一)[J].今日电子,1996(4):62-63.
5金力.针灸治疗颈椎病的临床观察[J].中国中医药现代远程教育,2018,16(15):131-133. 被引量：8
6陈新兴,谢文杰,王建新.地磁数据采集器设计[J].福建电脑,2017,33(12):151-151.
7邵路伊,秦小麟,王潇逸,郭成盖,邓丹萍.交互式多用户Skyline查询处理算法[J].计算机科学与探索,2018,12(8):1202-1213. 被引量：2
8李雪营,李磊,胡剑浩,杨圣华.基于RO电路变化PUF的FPGA实现[J].电子技术应用,2018,44(5):39-42. 被引量：5
9李正欣,郭建胜,王瑛,田舢,张晓丰,李超.DTW距离的过滤搜索方法[J].控制与决策,2018,33(7):1277-1281. 被引量：3
10韦庆锋,何国良.基于非相似原理快速查找多个shapelets[J].计算机工程与应用,2018,54(16):119-128.

软件工程与应用

2013年第6期

浏览历史

内容加载中请稍等...

Apriori算法在发现用户网页浏览模式上的应用

参考文献6

二级参考文献34

共引文献70

相关作者

相关机构

相关主题

浏览历史