摘要
分类是一类重要的数据挖掘问题,它的一般过程是先根据样本数据利用一定的分类算法得到分类规则,再依据该规则对新的数据进行类别的划分。文章详细介绍了两种简单但有效的分类方法:基于最小二乘法的线性分类器和k-最近邻分类器。通过对这两种分类器的比较,发现线性分类器计算简便、拟合具有低方差,适合处理类别之间相互重叠的区域比较小的数据。KNN分类器分类灵活,拟合偏差比较小,由于计算量比较大,该算法更适合于类别界限不是很明显,数据之间交叉或重叠比较多的数据集。
Classification is an important question in data mining.Its general procedure is to obtain the classification rules according to the classification algorithm from the sample data firstly,then categorize the new data according to the classification rules.The author introduces two simple but effective classification algorithms in this paper: the linear classifier based on the least squares method and k-nearest neighbor classifier.Through comparison of these two classifiers,we draw the conclusion that the linear classifier has little computation,low variance and is suitable for handling with the data with small overlaps.KNN classifiers is more flexible,unbiased,but due to large computation,the algorithm is suitable for dealing with the data with relatively more overlaps.
出处
《长沙通信职业技术学院学报》
2010年第4期22-25,共4页
Journal of Changsha Telecommunications and Technology Vocational College
基金
湖南省科技厅项目(2009GK3014)
湖南省教育厅项目(09c636)资助
关键词
最小二乘
最近邻
分类
least square
nearest neighbor
classification