期刊文献+

TFIDF_-NB协同训练算法 被引量:1

TFIDF_-NB Co-Operative Training Algorithm
下载PDF
导出
摘要 采用少量已标记和大量未标记文档进行文本分类已成为一种重要研究趋势 .在分析了 EM和联合训练 (Co-training)两类算法的基础上 ,提出一种新的协同训练算法 .该算法利用 Bayes和 TFIDF两种分类器结合少量已标记和大量未标记文档协同增量训练 .实验结果表明 ,协同训练算法正确率较高 ,平均错误率较 EM和联合训练低 。 The problem of combining a small set of labeled data with a large pool of unlabeled data for text classification task has been extensively studied. After introduction and analyses of EM and Co-training algorithms, Presented a new “co-operatived” training algorithm. Co-operated TFIDF and NB algorithms to incorporate labeled data with unlabeled data in training process incrementally. Experimental results show that Co-operative training algorithm achieves higher accuracy rate and lower average error than EM and Co-training, and performs better.
出处 《小型微型计算机系统》 CSCD 北大核心 2004年第12期2243-2246,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金 ( 60 2 72 0 5 1)资助
关键词 文本分类 半监督算法 联合训练算法 EM算法 协同增量训练 text classification semi supervise algorithm Co-training algorithm EM algorithm Co-operative training incrementally
  • 相关文献

参考文献11

  • 1Yang Y. An evaluation of statistical approaches to text categorization[J]. Information Retrieval, 1999,1(1/2):67-88.
  • 2Kamal Nigam, Andrew McCallum, Sebastian Thrun, etc. Learning to classify text from labeled and unlabeled documents[Z]. 1998,AAAI-98, 792--799.
  • 3Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training[C]. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, 92-100.
  • 4Nigam K, McCallum A, Thrun S etc. Text classification from labeled and unlabeled documents using EM[J]. Machine Learning, 2000, 39(2/3): 103-134.
  • 5McCallum A and Nigam K. A comparison of event model for navie Bayes text classification[C]. In: AAAI-98 Workshop on Learning for Text Categorization of the Fifteenth International Conference(ICML'98), 359-367.
  • 6Nigam K and Ghani R. Understanding the behavior of co-training[C]. In: Proceeding of KDD-2000 Workshop on Text Mining. 2000
  • 7Nigam K and Ghani R. Analyzing the effectiveness and applicability of co-training[C]. In: Ninth International Conference on Information and Knowledge Management (CIKM-2000), 2000, 86-93.
  • 8David Pierce and Claire Cardie. Limitaion of co-training for natural language learning from large datasets[C]. In: Proceedings of 2001 Conference on Empirical Methods in Natural Language Processing, 2001.
  • 9McCallum A and Nigam K. A comparison of event model for navie Bayes text classification[C]. In: AAAI-98 Workshop on Learning for Text Categorization of the Fifteenth International Conference(ICML'98), 359-367.
  • 10Joachims T. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization[C]. Machine Learning: Proceedings of the Fourteenth International Conference, 1997, 143-151.

同被引文献5

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部