期刊文献+

不同程度的监督机制在自动文本分类中的应用 被引量:1

APPLICATION OF DIFFERENT DEGREES OF SUPERVISION IN AUTOMATIC TEXT CATEGORIZATION
下载PDF
导出
摘要 自动文本分类技术涉及信息检索、模式识别及机器学习等领域。本文以监督的程度为线索 ,综述了分属全监督 ,非监督以及半监督学习策略的若干方法—NBC(Na veBayesClassifier) ,FCM (FuzzyC Means) ,SOM (Self OrganizingMap) ,ssFCM (semi supervisedFuzzyC Means)和gSOM(guidedSelf OrganizingMap) ,并应用于文本分类中。其中 ,gSOM是我们在SOM基础上发展得到的半监督形式。并以Reuters 2 15 78为语料 ,研究了监督程度对分类效果的影响 ,从而提出了对实际文本分类工作的建议。 Automatic text categorization techniques involve the areas of information retrieval,pattern recognition and machine learning.This paper unfolds with the degree of supervision,summarizing several methods in supervised,unsupervised and semi supervised learning strategies NBC(Nave Bayes Classifier),FCM(Fuzzy C Means),SOM(Self Organizing Map),ssFCM(semi supervised Fuzzy C Means)and gSOM(guided Self Organizing Map)and also their application in text categorization.Among them,gSOM is developed by us as the semi supervised variation of SOM.Reuters 21578 is adopted as the corpus to probe into the impact that degree of supervision has on the categorization performance,and then some suggestions for the practical text categorization work are put forward.
作者 丁磊 钱云涛
出处 《计算机应用与软件》 CSCD 北大核心 2004年第6期65-68,共4页 Computer Applications and Software
关键词 监督机制 自动文本分类技术 信息检索 模式识别 机器学习 半监督学习 非监督学习 Text categorization Supervised learning Unsupervised learning Semi-supervised learning
  • 相关文献

参考文献11

  • 1[2]Tom Mitchell, Machine Learning. McGraw-Hill, 1997.
  • 2[3]Y.Yang, X. Liu, A re-examination of text categorization methods. In 22th Ann Int ACM SIGIR Conference on R&D in Infomation Retrieval(SIGIR'99),1999.
  • 3[4]Y. Yang,An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999,1 (1/2): 67 ~ 88.
  • 4[5]T. Kohnen, Self-Organizing Maps. Springer Verlag, Berlin, 1997.
  • 5[6]Amine M. Bensaid et al. ,Partially supervised clustering for image segmentation. Pattern Recognition, 1996,29:859 ~ 871.
  • 6[7]J. C. Bezld&. Pattern recognition with fuzzy objective function algorithms.Plenum Press,New York,1981.
  • 7[8]H.C. Yang and C.H.Lee, Automatic category generation for text documents by self-organizing maps.In IEEE Proceedings,2000.
  • 8[9]D. Lewis, Reuters-21578 ( Distribution 1.0). 2001 [ http:∥www. daviddlewis. com/resources].
  • 9[10]G.Salton and C.Buckley, Term weighting approaches in automatic text retrieval. Information Processing and Management, 1988,24(5) :513 ~ 523.
  • 10[11]M.F.Porter, An algorithm for suffix stripping. Program, 1980,14(3): 130~137.

同被引文献4

  • 1郑小霞,钱锋.高斯核支持向量机分类和模型参数选择研究[J].计算机工程与应用,2006,42(1):77-79. 被引量:39
  • 2DARKEN C,MOODY J.Fast adaptive K-means clustering:some empirical results[C]//Proc of International Joint Conference on Neural Networks.San Diego:IEEE Press,1990:233-238.
  • 3LI Chao-feng,WANG Zheng-you,XU Lei.Remote sensing image classification method based on support vector machines and fuzzy membership function[C]//Proc of the 4th International Symposium on Multispectral Image Processing and Pattern Recognition.2005:1-7.
  • 4MARCELO B A,ANTONIO P B,JOAO P B.SVM-KM:speeding SVMs learning with a priori cluster selection and K-means[EB/OL].[2006-05-13].http://csdl.computer.org/comp/proceedings/sbrn/2000/0856/00/08560162abs.htm.

引证文献1

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部