摘要
自动分类是数据挖掘和机器学习中非常重要的研究领域.针对难以获得大量有类标签的训练集问题,提出了基于小规模训练集的增量式贝叶斯分类,给出增量式贝叶斯分类机理参数计算及其算法.对算法分两种情况处理:第一种情况是新增样本有类别标签,则利用现有分类器检验其类标签,如果匹配则保留当前分类器,否则利用新样本修正分类器;第二种情况是新增样本无类别标签,则利用现有分类器为其训练类标签,然后利用新样本来修正分类器.实验结果表明,该算法是可行有效的,比简单贝叶斯分类算法有更高的精度.增量式贝叶斯分类算法的提出为分类器的更新提供了一条新途径.
Automatic classification is an important research field in data mining and machine learning. An incremental Bayes classification principle, parameter calculation and algorithm based on small training set was presented to solve the difficult problem involving getting labeled training documents. Two cases can be processed by the algorithm: the labled and unlabeled incremental documents. The labeled documents are labeled using the original classification, and if match then remain the classifier, the new classification is trained from the incremental documents. The unlabeled documents are labeled using the original classification, and then the new classification is trained from the incremental documents. The experimental results show that this algorithm is feasible and effective, more accurate than Nave Bayes classification algorithm. The incremental Bayes classification algorithm provides a new method for updating of classification.
出处
《沈阳工业大学学报》
EI
CAS
2006年第4期422-425,433,共5页
Journal of Shenyang University of Technology
基金
国家自然科学基金资助项目(10471096)
关键词
增量学习
贝叶斯分类
类别标签
分类算法
贝叶斯网络
incremental learning
Bayes classification
classification label
classification algorithm
Nave Bayes