期刊文献+

基于部分监督学习的WEB日志数据预处理 被引量:1

Web Log Data Preprocessing Based on Partially Supervised Learning
下载PDF
导出
摘要 针对采集自客户端的Web数据,提出了一种基于部分监督学习的数据预处理方法.首先分析了数据清理的主要任务和样本数据的基本特征,然后采用基于部分监督学习的方法完成数据清理工作.该方法有两个核心步骤:(1)基于规则的学习完成正例标注,即获得训练集中的正例;(2)建立SVM分类器完成测试集中的正例标注. For data mining collected from the server, a data preprocessing method is proposed based on a partially supervised learning method. The main task of the data cleaning and the basic characteristics of the sample data are analyzed, and then the data cleaning is completed based on partially supervised learning. This method is divided into two steps: (1) Use the rules to extract positive examples,in a word, to obtain positive examples from the training set; (2) Establish a SVM classifier and mark the positive examples from the testing set.
出处 《内蒙古大学学报(自然科学版)》 CAS 北大核心 2015年第1期86-91,共6页 Journal of Inner Mongolia University:Natural Science Edition
基金 国家自然科学基金项目(批准号:61063018) 内蒙古高等学校科学研究资助项目(批准号:NJZY14334)
关键词 数据预处理 WEB日志挖掘 规则 部分监督学习 data preproeessing Web log mining rule partially supervised learning
  • 相关文献

参考文献9

  • 1王实,高文,李锦涛,谢辉.路径聚类:在Web站点中的知识发现[J].计算机研究与发展,2001,38(4):482-486. 被引量:59
  • 2Lenzerini M. Data integration:A theoretical perspective [C]//Proceedings of the Twenty-first ACM SIGACT SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2002),2002:233-246.
  • 3Ciszak L. Applications of clustering and association methods in data cleaning[C]//Proceedings of the Interna- tional Muhiconference on Computer Science and InformationTechnology, 2008,03:97-103.
  • 4Pyle D. Data Preparation for Data Mining l-M]. San Francisco,CA= Morgan Kaufmann Publishers Inc, 1999: 540.
  • 5Cooley R, Mobasher B, Srivastava J. Data preparation for mining World Wide Web browsing patterns[J]. Journal of Knowledge and Information Systems, 1999,1 (1) : 5-32.
  • 6Tanasa D, Trousse B. Advanced data preprocessing for intersites web usage mining[J]. Intelligent Systems, IEEE,2005,19(2) .. 59-65.
  • 7ING Chang-bin, Chen Li. Web Log Data Preprocessing Based On Collaborative Filtering [C]//Proceedings of the IEEE 2nd International Workshop On Education Technology and Computer Science,2010:ll8-121.
  • 8Ngu DS,Wu X. Sitehelper:A localized agent that helps incremental exploration of the World Wide Web[C]// Proceedings of the 6th International World Wide Web Conference, Santa Clara, 1997:691-700.
  • 9Liu B, Dai Y, Li X, et al. Building text classifiers using positive and unlabeled examples [C]//Proceedings of the Third IEEE International Conference on Data Mining (ICDM-2003) ,2003 :19-22.

二级参考文献1

  • 1Yan T,Proc of the 5th Int World Wide Web Conf,1996年,27页

共引文献58

同被引文献6

  • 1Chen M S,Park J S,Yu P S.Data Mining for Path Traversal Pattern in a Web Environment[R].USA:Proceedings of the 16th International Conference on Distributed Computing Systems,1996.
  • 2Tang YinLing,I Hsien Ting,Shyue-Liang Wang.Website Navigation Recommendation Based on Reinforcement Learning Technique[C]//The 3rd International Workshop on Intelligent Data Analysis and Management Springer Proceedings in Complexity.Springer Netherlands Publisher,2013:87-99.
  • 3Miller G A.The magical number seven,plus or minus two:some limits on our capacity for processing information[J].Psychological Review,1956,63:81-97.
  • 4Vázquez A,Oliveira J G,DezsZ,et al.Modeling bursts and heavy tails in human dynamics[J].Phys Rev E,2006,73(3):80-98.
  • 5邵秀丽,乜聚科,侯乐彩,田振雷.基于综合用户信息的用户兴趣建模研究[J].南开大学学报(自然科学版),2009,42(3):8-15. 被引量:10
  • 6宋擒豹,沈钧毅.Web日志的高效多能挖掘算法[J].计算机研究与发展,2001,38(3):328-333. 被引量:115

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部