摘要
在文本分类过程中,类别之间的重叠以及标志类别属性的不足会导致类别的边界之间出现模糊不确定性和粗糙不确定性,而传统的k-近邻方法无法解决这一问题;同时,在传统的k-近邻方法以及其他一些改进的k-近邻方法中,最优七值的选取需要通过训练得到.文中借助模糊-粗糙集理论来改进传统的k-近邻方法,并使用基于距离的邻城空间,以不经训练地确定适宜每个待分类文本的k-值,最后将所提方法和其他一些k-近邻方法进行了实验比较,结果表明模糊-粗糙集方法能够在一定程度上提高分类的精度和召回率.
In the text categorization process, fuzzy-uncertainty and rough-uncertainty would appear due to the overlapping of classes and the lack of features. These two kinds of uncertainty can not be dealt with by the conventional k-nearest neighbor (k-NN) method. Moreover, with the conventional k-NN method and other improved k-NN methods, the optimal value of k can only be obtained through training. To solve this problem, the theory of fuzzy-rough sets is employed to improve the conventional k-NN method. Also, the concept of distance based neighbor space is employed to obtain the fit value of k for each text to be classified. A comparison between the proposed method and other existing k-NN methods is finally made by experiments. It is concluded that the method based on the theory of fuzzy-rough sets can promote the precision and recall rate of text categorization to a certain degree.
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2004年第z1期73-76,共4页
Journal of South China University of Technology(Natural Science Edition)
关键词
模糊-粗糙集
模糊-粗糙隶属函数
k-近邻方法
文本分类
邻域空间
fuzzy-rough set
fuzzy-rough membership function
k-nearest neighbor method
text categorization
neighbor space