摘要
KNN算法是数据挖掘技术中比较常用的分类算法。但是,当样本容量较大以及特征属性较多时,KNN算法分类精度和效率将大大降低。该文将主分量分析(PCA)与粗糙集理论(RS)应用于样本特征提取中,首先采用PCA对输入向量进行甄别,应用粗糙集理论约简与分类无关或关系不大的向量。然后利用模拟退火算法实现随机属性子集选择,组合K近邻分类器,最后利用简单投票方法,对多重K近邻分类器进行组合输出,有效地改进了K近邻法的分类精度和效率。
The k-Nearest-Neighbor (KNN) algorithm has been widely used in data mining areas. But, When the samples become more and more large and characteristic attributes become more and more numerous, then KNN algorithm becomes much lower. A improved KNN algorithm PRMKNN is proposed in the paper ,which first applies Principle Component Analysis(PCA)and rough set theory(RS) to realize feature extraction, We use PCA on selecting the input vector,and use RS on reducing the inessential factors for classification ,then simulation annealing algorithm is used to generate random subset of attributes, and with the simple voting method, the outputs of the multiple KNN classifiers are combined. The method can improve the classification precision and efficiency effectively.
出处
《电脑知识与技术(过刊)》
2010年第3X期1989-1991,共3页
Computer Knowledge and Technology
关键词
主分量分析
粗糙集
模拟退火
K近邻
组合模型
principle component analysis
rough set
simulated annealing
k-Nearest-Neighbor
combination model