期刊文献+

Web日志挖掘中数据预处理技术的研究 被引量:9

Research on Data Preprocessing Technology in Web Log Mining
下载PDF
导出
摘要 数据预处理在Web日志挖掘过程中起着至关重要的作用,直接影响日志挖掘的质量和结果。详细分析了数据预处理的过程,提出一种改进的数据清洗方法,以提高日志挖掘中数据预处理的效率,并针对Web日志数据预处理中会话识别这一重要环节,提出一种改进的会话识别方法。在用户识别后,根据页面内容、站点结构确定页面重要程度,对阈值进行调整。然后,根据用户对页面内容的兴趣度来删除会话中的链接页面和不感兴趣的页面。实验结果表明,提出的方法能更准确地确定页面访问时间阈值,得到更为合理有效的会话集合。 Data preprocessing plays an essential role in the process of Web log mining,directly influenced the quality of the Web log mining and its results.Analyses data preprocessing process for Web log mining in detail,proposes an improved method of data cleaning,to improve the efficiency in data preprocessing of log mining,and proposes an improved method of session identification to Web log data preprocessing.The threshold is adjusted by the page weightness based on site's structure after the user identification.Then,delete the link pages and uninterested pages based on the user's interest degree of page content.Experimentally,the approach proposed can decide the access time threshold more accurately.It is more reasonable and effective.
出处 《计算机技术与发展》 2010年第5期47-50,共4页 Computer Technology and Development
基金 国家自然科学基金项目(60736014)
关键词 WEB日志挖掘 数据预处理 会话识别 数据清洗 Web log mining data preprocessing session identification data cleaning
  • 相关文献

参考文献8

二级参考文献18

  • 1(加)HanJ KamberM.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 2Büchner AG, Mulvenna MD. Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining [ J]. ACM SIGMOD Record, 1998,27(4) :54 -61.
  • 3Yang Qiang, Zhang Haining, Li Tianyi. Mining Web logs for prediction models in WWW caching and prefecting[C]//The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD'01. San Francisco: ACM SIGKDD, 2001.
  • 4Mikroyannidis A, Theodoulidis B. A theoretical framework and an implementation architecture for self adaptive Web sites[C]// Prodeedings of the IEEE/WIC/ACM International Conference on Web Intelligence(WI'04), Beijing: IEEE Press, 2004.
  • 5Berendt B, Mobasher B, Nakagawa M, et al. The impact of site structure and user environment on session reconstruction in Web usage analysis[C]// Proceedings of the 4th WebKDD 2002 Workshop at the ACM-SIGKDD Conference on Knowledge Discovery in Database. Edmonton, Alberta: ACM SIGKDD,2002.
  • 6Spiliopoulou M, Mobasher B, Berendt B, et al. A framework for the evaluation of session reconstruction heuristics in Web usage analysis[J]. Informs Journal of Computing, Special Issue on Mining Web-Based Data for E-Business Applications, 2003, 15(2): 171-190
  • 7Chen M S, Park J S, Yu P S. Data mining for path traversal patterns in a Web environment [C]// Proceedings of the 16th International Conference on Distributed Compute System. Hong Kong:IEEE Press,1996: 385-392.
  • 8Zaiane O R, Xin M, Han J. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs[C]//Proc Advances in Digital Libraries Conf, ADL' 98. Santa Barbara, CA:IEEE Press, 1998: 19-29.
  • 9Han Jiawei,Kamber M.Data Mining[M].Beijing:Higher Education Press,2000.
  • 10Serivastava J,Cooley R,Deshpande M,et al.Web Usage Mining:Discovery and Applications of Usage Patterns from Web Data[J].ACM SIGKDD Explorations,2000,1 (2):12-23.

共引文献108

同被引文献70

  • 1韩晓红.网络教学的特点与模式[J].甘肃高师学报,2004,9(2):63-65. 被引量:5
  • 2李桂英,李吉桂.基于模糊聚类的Web日志挖掘[J].计算机科学,2004,31(12):130-131. 被引量:13
  • 3张文东,易轶虎.基于兴趣相似性的Web用户聚类[J].山东大学学报(理学版),2006,41(3):45-47. 被引量:11
  • 4刘炜,陈俊杰.一种Web使用模式挖掘模型的设计[J].计算机应用研究,2007,24(3):184-186. 被引量:6
  • 5Robert Cooley,Bamshad Mobasher,Jaideep Srivastava.Data prepara-tion for mining world wide web browsing patterns[J].Knowledge and in-formation systems,1999,1(1):5-32.
  • 6Chen Ling,Bhowmick S S,Nejdl W. COWES:Web user clustering based on evolutionary web sessions[ J]. Data & Knowledge Engineering,2009,68(10) :867-885.
  • 7Kaur P,Garg R,Singh R,et al. Research on the Application of Web Mining Technique Based on XML for Unstructured Web Data Using LINQ [ J]. Advanced Materials Research,2011, 403-408:1062-1067.
  • 8Liu Bing,Mobasher B,Nasraoui 0. Web Usage Mining[ M]./ Data - Centric Systems and Applications. [ s. I.]. [ s. n.]. 2011.
  • 9Bonchi F,Giannotti F,Gozzi C. Web log data warehousing and mining for intelligent web caching[ J]. Data & Knowledge Engineering ,2001,39(2) :165-189.
  • 10Wang Weinan,Zaiane 0 R. Clustering Web Sessions by Sequence Alignment [ C]./Proceeding of the 13th International Workshop on Database and Expert Systems Applications. Canada: IEEE,2002.

引证文献9

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部