一种网络日志挖掘的高效算法被引量：2

An Efficient Algorithm with Incremental Data Mining for Web Usage Mining

下载PDF

导出

摘要提出了一种网络日志挖掘算法PWU,其采用了异构树结构。通过对异构树叶子节点进行编号,使得对候选集计数时只需对具有相同编号的叶子节点进行计数,极大地简化了候选集计数过程。在此基础上,算法还具有增量挖掘功能。最后,从理论分析和实验两方面证明了算法的高效性以及增量挖掘功能的高效性和完备性。 Mining server access data can provide significant and useful information. This paper presents an algorithm called PWU,which adopts the data structure named Heterogeneity. The data structure uses a set of rules to number the branches of the Heterogeneity Tree. The rules simplify the process of counting the support of candidates. Finally ,the completeness of the mined set and efficiency of the algorithm PWU are proved both in theory and experiment.

作者张兵

机构地区东南大学经济管理学院

出处《广西师范大学学报（自然科学版）》 CAS 北大核心 2006年第1期26-29,共4页 Journal of Guangxi Normal University:Natural Science Edition

基金国家自然科学基金资助项目(60463003) 北京市教育委员会科技发展计划项目(KM200510016002)

关键词网络日志挖掘PWU算法增量挖掘 Web usage mining PWU algorithm incremental data mining

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1COOLEY R,MOBASHER B,SRIVASTAVA J.Web mining:information and pattern discovery on the world wide Web[C]//Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence.Los Alamitos,Calif.:IEEE Computer Society,1997:558-567.
2PERKOWITZ M,ETZIONI O.Adaptive web sites:Conceptual cluster mining[C]//Proceedings of the 16th Int.Joint Conf.on Artificial Intelligence.San Francisco,Calif.:Morgan Kaufmann Publishers,1999:264-269.
3TAUSCHER L,GREENBERG S.How people revisit web pages:Empirical findings and implications for the design of history systems[J].International Journal of Human-Computer Studies,1997,47:97-137.
4ZAIANE O R,XIN Man,HAN Jia-wei.Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs[C]//Proc.of the 5th International Forum on Research and Technology Advances in Digital Libraries.Los Alamitos,Calif.:IEEE Computer Society,1998:19-29.
5TAMAKRISHNAN Srikant,RAKESH Agrawal.Mining sequential patterns:generalizations and performance improvements[C]//Proceedings of the 5th International Conference on Extending Database Technology.Berlin:Springer-Verlag,1996:3-17.
6苏毅娟,严小卫.一种改进的频繁集挖掘方法[J].广西师范大学学报（自然科学版）,2001,19(3):22-26. 被引量：10
7刘美玲,徐章艳,卢景丽,区玉明,袁鼎荣,吴信东.利用项集有序特性改进Apriori算法[J].广西师范大学学报（自然科学版）,2004,22(1):33-37. 被引量：11
8NEUSS C,VROMAS J.Applications CGI en Perl pour les Webmasters[M].Paris:International Thomson Publishing France,1996.
9张兵,聂永红,林士敏.NPSP:一种高效的序列模式增量挖掘算法[J].广西师范大学学报（自然科学版）,2004,22(4):22-26. 被引量：4
10SRIKANT R,AGRAWAL R.Mining Generalized Association Rules[J].Future Generation Computer Systems,1997,13(2/3):161-180.

二级参考文献20

1苏毅娟,严小卫.一种改进的频繁集挖掘方法[J].广西师范大学学报（自然科学版）,2001,19(3):22-26. 被引量：10
2Agrawal R ,Imielinski T ,Swami A. Mining association rules between sets of items in large databases[A]. Proceedings of the ACM SIGMOD international conference on management of data[C]. New York:ACM Press, 1993. 207-216.
3Jong Soo Park,Ming-Syan Chen ,Philip S Yu. Using a hash based method with transaction trimming for mining association rules [J]. IEEE Transactions on Knowledge and Data Engineering, 1997,9(5):813-825.
4Agrawal R,Srikant R. Fast algorithms for mining association rules[A]. Proceedings of the 20th VLDB conference[C].San Mateo:Morgan Kaufmann Publishers,1994.487-499.
5Han J,Pei J,Yin Y. Mining frequent patterns without candidate generation[A]. Proceedings of the 2000 ACM SIG-MOD international conference on management of data[C]. New York:ACM Press,2000.1-12.
6Agrawal Rakesh,Srikant Ramakrishnan.Mining sequential patterns[A].Proceedings of the 11th international conference on data engineering[C].Los Alamitos,CA:IEEE Computer Society Press,1995.3-14.
7Srikant Ramakrishnan,Agrawal Rakesh.Mining sequential patterns:generalizations and performance improvements[A].Proceedings of the 5th international conference on extending database technology[C].Berlin:Springer-Verlag,1996.3-17.
8Masseglia F,Cathala F,Poncelet P.The PSP approach for mining sequential patterns[A].Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery[C].Berlin:Springer-Verlag,1998.176-184.
9Mueller A.Fast sequential and parallel algorithms for association rule mining:a comparison(technical report CS-TR-3515)[R].College Park:University of Maryland,1995.
10Agrawal R,Srikant R.Fast algorithms for mining association rules in large databases[A].Proceedings of the 20th international conference on very large databases[C].San Mateo:Morgan Kaufmann Publishers,1994.487-499.

共引文献22

1牛力.数据挖掘中的统计分析技术应用研究[J].广西师范大学学报（哲学社会科学版）,2002,38(S1):226-229. 被引量：6
2卢景丽,徐章艳,刘美玲,区玉明.一种改进的负关联规则挖掘算法[J].广西师范大学学报（自然科学版）,2004,22(2):41-46. 被引量：8
3尹云飞,区玉明,张师超,黄红兵.双重区间值聚类挖掘模型[J].广西师范大学学报（自然科学版）,2004,22(3):15-18. 被引量：3
4张兵,聂永红,林士敏.NPSP:一种高效的序列模式增量挖掘算法[J].广西师范大学学报（自然科学版）,2004,22(4):22-26. 被引量：4
5刁哲军,吴欣明,靳慧龙,许成谦.似最佳自相关序列偶的研究[J].广西师范大学学报（自然科学版）,2005,23(3):17-20. 被引量：1
6曲守宁,王钦,邹燕,刘魁,朱强.基于数据挖掘的智能答疑系统(英文)[J].郑州大学学报（理学版）,2007,39(2):50-54. 被引量：3
7侯向丹,董永峰,顾军华,刘洪普.基于关联规则的电力仪表企业故障诊断[J].广西师范大学学报（自然科学版）,2007,25(4):83-86. 被引量：1
8郭健美,宋顺林,肖仁财.高效的关联规则挖掘算法[J].计算机工程与设计,2008,29(13):3378-3380. 被引量：4
9李晓凯,郭红.一种可变长子片段对拼接的DNA双序列局部比对算法[J].广西师范大学学报（自然科学版）,2008,26(4):53-57.
10黄肇明.Apriori算法的改进及其在单病种信息挖掘中的应用[J].广西科学院学报,2009,25(4):264-266.

同被引文献11

1CHAKRABARTI S,van den BERG M,DOM B.Focused crawling:a new approach to topic-specific Web resource discovery[J].Computer Networks,1999,31(11/16):1623-1640.
2PELLEG D,MOORE A.X-means:extending K-means with efficient estimation of the number of clusters[C]//Proceedings of the 17th International Conference on Machine Learning.San Francisco:Morgan Kaufmann Publishers,2000:727-734.
3LAWRENCE S,GILES C L.Searching the World Wide Web[J].Science,1998,280(5360):98-100.
4HERSOVICI M,HEYDON A,MITZENMACHER M,et al.The shark-search algorithm--an application:tailored Web site mapping[J].Networks and ISDN Systems,1998,30(17):317-326.
5Cloudera, Inc. Flume User Guide [ EB/OL]. 2012 - 08. http://archive, cloudera, com/cdh/3/flume/UserGuide/.
6叶昭晖,曾琼,李强.基于搜索引擎的网络舆情监控系统设计与实现[J].广西大学学报（自然科学版）,2011,36(A01):302-307. 被引量：15
7李建江,崔健,王聃,严林,黄义双.MapReduce并行编程模型研究综述[J].电子学报,2011,39(11):2635-2642. 被引量：185
8郝文江,武捷.互联网舆情监管与应对技术探究[J].信息网络安全,2012(3):1-4. 被引量：7
9李彬,刘莉莉.基于MapReduce的Web日志挖掘[J].计算机工程与应用,2012,48(22):95-98. 被引量：15
10刘琳.论网络隐私权保护及其完善[J].四川教育学院学报,2012,28(7):47-51. 被引量：2

引证文献2

1赵燕,陈晓云,莫明辉,汤勇.基于用户群的智能主题爬虫[J].广西师范大学学报（自然科学版）,2007,25(2):230-233. 被引量：3
2吴建军.网络舆情的云计算监测模式分析与实现[J].电讯技术,2013,53(4):476-481. 被引量：4

二级引证文献7

1张洪斌,危胜军.基于超图的并行信息采集系统任务划分方法[J].广西师范大学学报（自然科学版）,2008,26(1):212-215.
2孙玲芳,黎维良.基于定题爬虫的网页分类的多级判定算法[J].科学技术与工程,2009,9(18):5534-5537. 被引量：1
3王小平,王建勇,杨埙.采用云计算技术的网络流量检测[J].电讯技术,2014,54(5):650-655. 被引量：1
4彭浩,周杰,周豪,赵丹丹.微博网络中基于主题发现的舆情分析[J].电讯技术,2015,55(6):611-617. 被引量：4
5薛丽敏,吴琦,李骏.面向专用信息获取的用户定制主题网络爬虫技术研究[J].信息网络安全,2017(2):12-21. 被引量：18
6曹宇逢.网页正文提取方法在舆情监控中的应用探究[J].数字技术与应用,2018,36(9):231-231.
7罗洪云,林向义,邵强,崔明欣.大数据环境下我国网络舆情研究知识图谱分析[J].情报探索,2019,0(7):128-134. 被引量：3

1李明.网络日志挖掘技术探究[J].硅谷,2010,3(14):63-64.
2刘宗成,张忠林,田苗凤.基于关联规则的网络行为分析[J].电子科技,2015,28(9):16-18. 被引量：6
3李毅,李石君.本地缓存和代理服务器环境下的网络日志挖掘[J].计算机工程,2003,29(5):46-48. 被引量：6
4郭正恩.基于用户兴趣的个性化推荐[J].光盘技术,2009(7):19-21. 被引量：1
5黎敏,仇洪冰,郑继禹.神经网络在ATM网络流量预测中的应用[J].桂林电子工业学院学报,1998,18(4):11-14.
6杨富华.网络日志预处理中优化的会话识别算法[J].计算机仿真,2011,28(4):123-125. 被引量：4
7庄力可,寇忠宝,张长水.网络日志挖掘中基于时间间隔的会话切分[J].清华大学学报（自然科学版）,2005,45(1):115-118. 被引量：24
8王勇,刘奕群,张敏,马少平,茹立云.基于用户兴趣分析的网页生命周期建模[J].中文信息学报,2008,22(2):76-80. 被引量：5
9崔丽群,张明杰,吴凡.基于边缘信息车流量检测方法的研究[J].计算机应用与软件,2014,31(12):249-252. 被引量：1
10付生.基于网络日志挖掘技术数据信息分析的研究[J].科技与创新,2015(6):68-69. 被引量：4

广西师范大学学报（自然科学版）

2006年第1期

浏览历史

内容加载中请稍等...

一种网络日志挖掘的高效算法被引量：2

参考文献10

二级参考文献20

共引文献22

同被引文献11

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

一种网络日志挖掘的高效算法 被引量：2

参考文献10

二级参考文献20

共引文献22

同被引文献11

引证文献2

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

一种网络日志挖掘的高效算法被引量：2