摘要
为了从海量的信息资源库中快速、准确地进行分类并提取出有用的信息,提出了一种基于粗糙集和KNN混合的Web文本分类模型。利用粗糙集的属性约简理论降低了文本分类过程中的向量维数,使用一种基于分明矩阵的属性约简算法,特征选择过程采用互信息量计算方法,并对该混合算法进行了实验,同时结合传统的KNN方法对该混合算法进行比较,验证该算法的可行性。
In order to quickly and precisely classify and search useful information from huge information database, in the paper a kind of mixed model of web text classification based on rough set and KNN was introduced. By using the theory of attributes reduction of rough set, number of vector dimensions in text classification process was reduced. A kind of simplified algorithm for attributes reduction based on distinct matrix was used. In the process of feature selection, method of mutual information was used. Experiments with the mixed model were conducted. The results compared with traditional KNN method show that the mixed algorithm is feasible.
出处
《安徽理工大学学报(自然科学版)》
CAS
2008年第4期89-92,共4页
Journal of Anhui University of Science and Technology:Natural Science
关键词
WEB文本分类
粗糙集
KNN
属性约简
web text classification
rough set
K nearest neighbor
attributes reduction