摘要
介绍了进行文本分类的关键技术,并着重介绍了常用的文本特征提取方法。选取支持向量机方法作为文本分类器方法,选取不同特征提取方法应用于文本分类,通过实验,比较和分析了由不同的提取方法所构成的分类器的分类性能,确定了信息增益(IG)法和文本证据权(W ET)为两种性能优异的特征提取方法。该结论可为分类性能进一步的优化研究奠定理论和实践基础。
The article introduce critial technique of text categorization,and Introduce especially the method about selection of text feather.Choising SVM as the method of text classifion training and choise the different methods in selection of text feather,we compare and analyze the categorization capacity which is moded of different method about selection of text feather by experiment.we can get the conclusion that IG and WET are best methods of selection of text feather,which will lay solid foundation for theory and pratice.
出处
《内蒙古石油化工》
CAS
2011年第19期18-20,共3页
Inner Mongolia Petrochemical Industry
关键词
文本分类
互信息
信息增益
SVM
特征提取
Text Categorization
Mutual Information
Information Gain
SVM
Feather Selection