摘要
为提高文本分类精度,提出一种基于模糊软集合理论的文本分类方法。该方法把文本训练集表示成模糊软集合表格形式,通过约简、构造软集合对照表方法找出待分类文本所属类别,并针对文本特征提取过程中由于相近特征而导致分类精度下降问题给出一种基于正则化互信息特征选择算法,有效地解决了上述问题。与传统的KNN和SVM分类算法相比,模糊软集合方法在文本分类的精度和准度上都有所提高。
A text classification approach based on soft set theory is proposed to enhance the accuracy of the text classification.The text training set is mapped onto a fuzzy soft set,the category of the new text can be achieved through the reduction of soft set table and construction of the comparison table of the soft set,in order to solve the problem that classification accuracy degrades when the feature is closely related to the selected feature,this paper gives a new feature selection algorithm based on normalization mutual information feature selection algorithm.Comparing with traditional KNN and SVM classification algorithm,the fuzzy soft set approach has the improvement on classification precision and accuracy.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第13期90-92,共3页
Computer Engineering
基金
广东省自然科学基金资助项目(9151001003000005)
关键词
文本分类
软集合
模糊软集合
特征选择
互信息
text classification
soft set
fuzzy soft set
feature selection
mutual information