
Web日志预处理的Clementine方案 被引量:5

The Clementine Solution for Web Log Preprocessing
摘要 利用Clementine完成Web日志预处理数据流的初步构建,实现了数据清洗、用户识别、会话识别、路径补充4大过程,同时具备日志合并、数据审核、规范编码、外部信息关联等辅助功能。实验研究表明,利用Clementine对Web日志进行预处理是完全可行的,这为在该平台上进一步完成挖掘工作奠定了基础,从一定程度上解决了Web日志挖掘与预处理交由不同工具处理的困境,提高了Web日志挖掘的自动化程度。 The paper introduces the preliminary structuring of preprocessing data stream for web log by Clementine, which implements the following procedures: data cleaning, user identification, session identification and path complementary, etc. In addition, it also provides some auxiliary, functions, such as log merging, data auditing, coding specification, associating with external information, etc. Experimental result indicates that web log preprocessing based on Clementine is completely feasible, which lays a foundation for further log mining on the same platform. To some extent, it resolves the problem that web log mining and preprocessing are treated by different tools, thus improving the degree of automation for web log mining.
作者 郑慧霞 徐硕
出处 《医学信息学杂志》 CAS 2009年第12期33-36,40,共5页 Journal of Medical Informatics
基金 中国医学科学院医学信息研究所基本科研业务费专项"基于Web日志统计的图书馆网站读者行为分析"(项目编号:08R0130)
关键词 CLEMENTINE Web日志预处理 数据流 Clementine Web log preprocessing Data stream
  • 相关文献


  • 1陈新中,李岩,杨炳儒,谢永红,张运涛.Web日志挖掘技术进展[J].系统工程与电子技术,2003,25(4):492-495. 被引量:17
  • 2互联网数据挖掘综述-web使用记录的挖掘[EB/OL].http://www.dwway.com/html/80/n-2180-3.html.
  • 3孔昊,周长胜.Web日志挖掘预处理研究[J].北京机械工业学院学报,2005,20(4):28-31. 被引量:8
  • 4AWStats简介[EB/OL].http://www.chedong.com/tech/awstats.html.
  • 5Maristella Agosti and Giorgio Maria Di NunZio. Web Log Mining: a study of user sessions [ EB/OL ]. http : //www. dblab, ntua. gr/persd12007/papers/72, pdf,.


  • 1庄力可,寇忠宝,张长水.网络日志挖掘中基于时间间隔的会话切分[J].清华大学学报(自然科学版),2005,45(1):115-118. 被引量:24
  • 2Bunchner A G, Mulvenna M D. Discovering Internet Marketing Intelligence Through Online Analytical Web Usage Mining[ J ]. SIGMOD.Record, 1998, 27(4).
  • 3Chen M S, Park J S. Efficient Data Mining For Traversal Patterns[J].IEEE. Trans. Knowledge and Data Engineering, 1998, 10(2).
  • 4Borges J, Levene M. Data Mining of User Navigation Patterns[C]. In:Proc Web Usage Analysis and User Profiling Workshop, San Diego,California, 1999.
  • 5Myra Spiliopoulou, Faulstich Lukas C.WUM: A Tool for Web Utilization Analysis[M]. In EDBT Workshop WebDB'98, Valencia,Spain Springer Verbs, 1998.
  • 6Zaiane R. Xin M, Han J. Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology oa Web Logs[C].Proc. Advances in Digital Libraries Conf. (ADL'98), Santa Barbara,CA, April 1998:19-29.
  • 7Cyrus Shahabi, Amir Zalkesh, Jafar Adibi, et al. Knowledge Discovery from Users Wen-Page Navigatioa [ C ]. In Proceedings of the IEEE RIDE97 Workshop, April 1997.
  • 8Eyzloni O. The World W'dd Web: Quagadre or Gold Mine[J]. Communication of the ACM, 1996, 39(11).
  • 9Raymond Kosala, Hendrik Blocked. Web Mining Research: A Survey[J] .SIGKDD Exllorations, 2000,2(1).
  • 10Jiawei Hart, Micheline Kamber. Date Mining: Cedncepts and Techniques[M]. Copyright 2001 by Morgan Kaufmann Publishers, Inc., 2001.



  • 1吕佳.Web日志挖掘技术应用研究[J].重庆师范大学学报(自然科学版),2006,23(4):39-44. 被引量:15
  • 2孔昊,周长胜.Web日志挖掘预处理研究[J].北京机械工业学院学报,2005,20(4):28-31. 被引量:8
  • 3王玲,陈安,陈中.药品不良反应监测的智能数据分析方法及应用[J].中国药事,2007,21(7):476-477. 被引量:9
  • 4互联网数据挖掘综述:Web使用记录的挖掘[EB/OL].[2009-11-25].http://www.dwway.com/html/80/n-2180-3.html.
  • 5Chen MS, Park JS, Yu PS. Data mining for path traversal patterns in a web environment [ C ]. International Conference on Distributed Computing Systems, Hongkong. 1996:385-392. http://citeseerx. ist. psu. edu/viewdoc/download? doi = 9534&rep = repl &type = pdf.
  • 6Clementine的数据挖掘中文教程[EB/OL].[2009-11-25].http://www.quanwen.com.cn/doc/1544013/.
  • 7Chen MS, Park JS, Yu PS. Yu,Efficient Data Mining for Path Traversal Patterns [ J ]. IEEE Trans Knowl Data Eng (S1041 - 4347), 1998,10(2) :209-221.
  • 8网站流量统计指标及其网络营销含义:独立访问者数量分析[EB/OL].(2007-04-16)[2009-1l一25].http://hi.baidu.com/jaso/blog/itera/af50220868c95fd062d9860e.html.
  • 9李歌维.Web日志挖掘数据预处理与数字图书馆个性化服务[J].情报杂志,2007,26(8):90-91. 被引量:8
  • 10Evans S J, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf, 2001,10(6):483-6.










使用帮助 返回顶部