
一种辅以强规则学习的双层文本分类模型 被引量:3

Ensemble Text Classification Model Supplemented by Strong Rules Learning
摘要 随着基于机器学习的文本自动分类方法成为主流分类技术,基于机器学习的文本分类方法往往忽视了对规则分类方法的有效运用。该文将基于规则的分类思想和基于机器学习的分类方法有机地结合起来,把规则判别看作一个分量分类器,提出了一种辅以规则补充的双层文本分类模型和一种优化的分类规则学习算法。根据该方法设计并实现了一个基于规则和N-Gram统计分类相结合的双层分类器,进行了双层分类模型与单独的N-Gram分类模型的实验,结果表明辅以规则补充的双层分类器具有更好的分类性能。 Most Chinese text classification methods are applied to the machine learning technologies, while ignoring the traditional methods based on decision rules. This paper combines them into a whole classifier, taking the rule-based learner as a component classifier, and proposes a new optimized rule induction algorithm for the purpose of automatic generated "strong" decision rules. The experiment result shows that the mixed classifier outperforms the single N-Gram classifying method based on machine learning.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第8期165-167,共3页 Computer Engineering
关键词 文本分类 文档索引 分类规则学习 Text classification Document index Classification rules learning
  • 相关文献


  • 1He Qinming, Qiu Ling, Zhao Guotao. Text Categorization Based on Domain Ontology[C]//Proc. of Web Information Systems. 2004.
  • 2周新栋,王挺.基于N元语言模型的文本分类方法[J].计算机应用,2005,25(1):11-13. 被引量:11
  • 3Agrawal R. Srikant R. Fast Algorithms for Mining Association Rules[C]//Proceedings of International Conference on Very Large Databases. 1994-09.
  • 4Michalski R S. On the Quasi-minimal Solution of the General Covering Problem[C]//Proceedings of the 1^st International Symposium on Information Processing. 1969: 125-128.
  • 5Sebastiani F. Machine Learning in Automated Text Categorization[J].ACM Computing Surveys, 2002, 34(1): 1-47.
  • 6Witten I H, Bell T C. The Zero-frequency Problem: Estimating the Probabilities of Novel Events In Adaptive Text Compression[J]. IEEE Transactions on Information Theory, 1991, 37(4): 1085-1094.


  • 1周强.基于语料库和面向统计学的自然语言处理技术[J].计算机科学,1995,22(4):36-40. 被引量:25
  • 2till 捷 常宝宝.自然语言处理技术基础[M].北京:北京邮电大学出版社,2002..
  • 3姚天顺 朱靖波.自然语言理解(第2版)[M].北京:清华大学出版社,2002..
  • 4AAS K, EIKVIL L. Text Categorization: A Survey[ R]. Technical Report 0941, Norwegian Computing Center, 1999.
  • 5DUDA RO, HART PE, STORK DG. Patten Classification[ M]. 2nd Edition. America: John Wiley & Sons Inc, 2001.
  • 6MITCHELL TM. Machine Learning [ M ] . America : McGraw - Hill Companies, Inc, 1997.
  • 7YANG Y, PEDERSEN JP. Feature Selection in Statistical Learning of Text Categorization[ A]. Tthe 14th Inc Conf, On Machine learning, 1997. 412 -420.
  • 8JOACHIM T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[ A]. Processing of ICML-97, 14th International Conference on Machine Lmuning[ C], 1996. 143 - 151.
  • 9JOACHIMS T . Text categorization with support vector machines :learning with many relevant features[ A]. Proceedings of ECML-98,10th European Conference on Machine Learning[ C], 1998. 137 -142.
  • 10KATZ SM. Estimation of probabilities from sparse data for the Langnage model component of a speech recognizer[ J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1987, 35 (3]:400 -401.












使用帮助 返回顶部