摘要
针对采集自客户端的Web数据,提出了一种基于部分监督学习的数据预处理方法.首先分析了数据清理的主要任务和样本数据的基本特征,然后采用基于部分监督学习的方法完成数据清理工作.该方法有两个核心步骤:(1)基于规则的学习完成正例标注,即获得训练集中的正例;(2)建立SVM分类器完成测试集中的正例标注.
For data mining collected from the server, a data preprocessing method is proposed based on a partially supervised learning method. The main task of the data cleaning and the basic characteristics of the sample data are analyzed, and then the data cleaning is completed based on partially supervised learning. This method is divided into two steps: (1) Use the rules to extract positive examples,in a word, to obtain positive examples from the training set; (2) Establish a SVM classifier and mark the positive examples from the testing set.
出处
《内蒙古大学学报(自然科学版)》
CAS
北大核心
2015年第1期86-91,共6页
Journal of Inner Mongolia University:Natural Science Edition
基金
国家自然科学基金项目(批准号:61063018)
内蒙古高等学校科学研究资助项目(批准号:NJZY14334)