期刊文献+

A fuzzy method to learn text classifier from labeled and unlabeled examples

A fuzzy method to learn text classifier from labeled and unlabeled examples
下载PDF
导出
摘要 In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally, the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process, the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Nave Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning. In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally, the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process, the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Nave Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning.
作者 刘宏 黄上腾
出处 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2004年第1期98-102,共5页 哈尔滨工业大学学报(英文版)
关键词 text categorization FUZZY CLUSTERING 文本分类器 文献标记 文献检索 模糊支持向量机
  • 相关文献

参考文献5

  • 1Kamal Nigam,Andrew Kachites Mccallum,Sebastian Thrun,Tom Mitchell.Text Classification from Labeled and Unlabeled Documents using EM[J].Machine Learning (-).2000(2-3)
  • 2NIGAM K.Text classification from labeled and unlabeled documents using EM[].Machine Learning.2000
  • 3BENSAID A M.Partially supervised clustering for image segmentation[].Pattern Recognition.1996
  • 4CRAVEN M.Learning to construct knowledge bases from the World Wide Web[].Artificial Intelligence.2000
  • 5SEBASTIANI F.Machine learning in automated text categorization[].ACM Computing Surveys.2002

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部