摘要
自动文本分类技术涉及信息检索、模式识别及机器学习等领域。本文以监督的程度为线索 ,综述了分属全监督 ,非监督以及半监督学习策略的若干方法—NBC(Na veBayesClassifier) ,FCM (FuzzyC Means) ,SOM (Self OrganizingMap) ,ssFCM (semi supervisedFuzzyC Means)和gSOM(guidedSelf OrganizingMap) ,并应用于文本分类中。其中 ,gSOM是我们在SOM基础上发展得到的半监督形式。并以Reuters 2 15 78为语料 ,研究了监督程度对分类效果的影响 ,从而提出了对实际文本分类工作的建议。
Automatic text categorization techniques involve the areas of information retrieval,pattern recognition and machine learning.This paper unfolds with the degree of supervision,summarizing several methods in supervised,unsupervised and semi supervised learning strategies NBC(Nave Bayes Classifier),FCM(Fuzzy C Means),SOM(Self Organizing Map),ssFCM(semi supervised Fuzzy C Means)and gSOM(guided Self Organizing Map)and also their application in text categorization.Among them,gSOM is developed by us as the semi supervised variation of SOM.Reuters 21578 is adopted as the corpus to probe into the impact that degree of supervision has on the categorization performance,and then some suggestions for the practical text categorization work are put forward.
出处
《计算机应用与软件》
CSCD
北大核心
2004年第6期65-68,共4页
Computer Applications and Software