摘要
随着基于机器学习的文本自动分类方法成为主流分类技术,基于机器学习的文本分类方法往往忽视了对规则分类方法的有效运用。该文将基于规则的分类思想和基于机器学习的分类方法有机地结合起来,把规则判别看作一个分量分类器,提出了一种辅以规则补充的双层文本分类模型和一种优化的分类规则学习算法。根据该方法设计并实现了一个基于规则和N-Gram统计分类相结合的双层分类器,进行了双层分类模型与单独的N-Gram分类模型的实验,结果表明辅以规则补充的双层分类器具有更好的分类性能。
Most Chinese text classification methods are applied to the machine learning technologies, while ignoring the traditional methods based on decision rules. This paper combines them into a whole classifier, taking the rule-based learner as a component classifier, and proposes a new optimized rule induction algorithm for the purpose of automatic generated "strong" decision rules. The experiment result shows that the mixed classifier outperforms the single N-Gram classifying method based on machine learning.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第8期165-167,共3页
Computer Engineering
关键词
文本分类
文档索引
分类规则学习
Text classification
Document index
Classification rules learning