期刊文献+

基于分类规则树的频繁模式文本分类 被引量:19

Text Categorization Based on Classification Rules Tree by Frequent Patterns
下载PDF
导出
摘要 基于频繁模式的关联分类是近年来出现的一种分类方法,该方法利用各类别频繁出现的模式构造分类规则,并对新文本进行分类.但现有关联分类方法应用于文本分类时存在两方面不足:一方面,用以构造分类规则的频繁模式仅考虑特征词在文本中出现与否,从而忽视了出现频度;另一方面,当产生的规则数量较多时,为提高分类效率需要进行规则修剪,修剪后的分类准确性明显降低.为此,提出了基于分类规则树的带词频的频繁模式文本分类方法.研究结果表明,词频的引入可以提高关联分类的准确率;而采用分类规则树可使分类时间明显加快又确保不降低分类质量.这两方面的措施弥补了现有关联分类应用于文本分类的不足.与3种典型文本分类方法比较后发现,在低维特征空间中,关联分类的性能优于Bayes,kNN(knearestneighbor)和SVM(supportvectormachines),因此是一种很有应用前景的文本分类方法. Association categorization approach based on frequent patterns has been recently presented, which builds the classification rules according to frequent patterns in various categories and classifies the new text employing these rules. But there are two shortages when the method is applied to classify text data: one is that the method ignores the information about word's frequency in a text; another is that the rule pruning to improve the classification efficiency will lead to obvious descending of accuracy when mass rules are generated. Therefore, a text categorization algorithm based on frequent patterns with term frequency is presented. This study illuminates that the word frequency is helpful for improving the accuracy of the association categorization and the classification rule tree can improve the efficiency of the association classification. The result of experiments shows the performance of association classification is better than three typical text classification methods Bayes, kNN (k nearest neighbor) and SVM (support vector machines), so it is a promising text classification method.
出处 《软件学报》 EI CSCD 北大核心 2006年第5期1017-1025,共9页 Journal of Software
基金 国家自然科学基金 福建省教育厅科技基金~~
关键词 频繁模式 文本分类 词频 关联规则 分类规则 frequent pattern text categorization term frequency association rule classification rule
  • 相关文献

参考文献1

二级参考文献7

共引文献22

同被引文献205

引证文献19

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部