摘要
为验证理论训练数量(10~30 p)对参数分类器(如最大似然分类)、非参数分类器(如支撑向量机)的适用性以及样本特征(光谱统计、空间分布特征)对分类器分类精度的影响,选择不同规模的训练样本进行最大似然分类和支撑向量机分类,分析分类精度与样本之间的关系。实验结果表明:随着样本量的增加,最大似然、支撑向量机分类精度均随样本量增多而提高并趋于稳定,最大似然分类精度的增长速度要快于支撑向量机。MLC受样本量的影响较大,在小样本的时候(5个),分类精度不稳定,超过30个样本的时候,分类精度稳定下来;对于SVM分类器,在小样本的时候(5个),分类精度较高且稳定,因此SVM分类适合于小样本分类,不受限于理论样本量的影响。当样本量超过最小理论样本量值(30个)的时候,最大似然分类精度要优于支撑向量机,主要是由于当样本量增加后,最大似然更易于获得有效的信息量样本,而对于支撑向量机边缘信息样本的增加数量不大。研究结果为进一步优化样本进行分类打下前期的实验基础。
It is of great significance for parametric and non-parametric classifiers to assess their classification accuracy and performance influenced from the training sample size.The theoretical training sample size(10~30 p,pdenotes the bands number of remote sensing image)is widely used as a criteria for training sample selection.The principals of classifiers,such as parameter and non-parameter classifiers,are different,and the theoretical training may be not universal and suitable for all the parameters.This paper carried out a study focusing on the analysis of classification accuracy with different training sample size,and the maximum likelihood classification(MLC)as parametric classifier and support vector machines(SVM)as non-parametric classifier are the typical and popular classifiers were introduced.The results demonstrated that the accuracies of MLC and SVM are improved and tend to be stable accompanying with the sample amount increment.It was interesting that the increasing speed of MLC is higher than that of SVM because there are more informative training samples which can describe the land cover information for MLC,while the edge pixels of land cover feature space is the informative training sample for SVM.For MLC,the accuracy fluctuation with 5training samples is obvious,while stable results with more than 30 training samples can be achieved,which represents the MLC classifier is sensitive to the training sample amount.For SVM as non-parameters classifier,the higher stable accuracy compared to MLC could be also obtained with little sample,even with 5samples,representing small training sample is suitable for SVM and break the limitation of theoretical training sample size.MLC could achieve higher accuracy than that of SVM when theoretical training samples as more than 30 were used.Under such condition,the training sample set can describe the normal spectral feature space for MLC,while the sampled selected randomly from the training sample collection has not enough informative pixels to construct the support vectors which is the basis for SVM.Analysis on the principle of different classifier,the classification accuracy for land cover mapping is different influenced from the different training sample size,and the theory of theoretical training sample is not the sole criteria for training sample size determination.The different optimized training sample selection according to classifier's principle is further explored based on above research results.
出处
《遥感技术与应用》
CSCD
北大核心
2016年第4期748-755,共8页
Remote Sensing Technology and Application
基金
国家自然科学基金青年项目(41301444)
北京市教育委员会北京市高等学校"青年英才计划"项目
北京工业职业技术学院校内一般课题(bgzyky201518)
国家重大专项高分辨率对地观测系统专项重大科技工程资助
关键词
样本特征
分类精度
光谱离散重叠度
最大似然分类
支撑向量机
Sample characteristic
Classification accuracy
Spectral Discrete Overlap Degree(SDOD)
Maximum Likelihood Classification(MLC)
Support Vector Machine(SVM)