摘要
针对计算机辅助诊断(CAD)中标记病例样本难以收集所引起的小样本学习问题,提出基于混合类别标记新技术(HCLT)的小样本学习算法.该算法分别基于几何距离、概率分布及语义概念对大量存在的未标记样本进行差异化标记,将有一致标记结果的样本加入样本集,以此扩大训练样本集.为了减少错误标记样本对学习过程造成的不利影响,提出样本伪标记隶属度并引入模糊支持向量机(FSVM)学习中,由隶属度控制样本对学习过程的贡献程度.基于UCI数据集的实验结果表明,采用该算法能够解决小样本学习问题的有效性.与单一类别标记技术相比,该算法产生的错误标记样本显著减少、学习性能显著改善.
A small sample learning algorithm based on a novel hybrid class-labeling technique(HCLT)was proposed in order to address the learning problem resulting from the underrepresented labeled training set in computer-aided diagnosis(CAD).The abundant unlabeled samples were labeled by HCLT with three diverse class labeling schemes respectively from the view point of geometric similarity,probabilistic distribution and semantic concept.Only those unlabeled samples which get the unanimous labeling results from three different labeling schemes were added to the training set in order to enlarge the labeled training set.The memberships of pseudo-labeled samples were introduced to fuzzy support vector machine(FSVM)in order to reduce the adverse effects for learning performance resulting from the still existing labeling mistakes.The contributions of pseudo-labeled samples to learning task were determined by their memberships.Classification experiment results based on datasets in UCI show that the proposed algorithm can deal with the small sample learning problem.The algorithm has less mistakes and better classification performance compared with the other algorithms which adopt the single labeling scheme.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2016年第1期137-143,共7页
Journal of Zhejiang University:Engineering Science
基金
浙江省自然科学基金资助项目(LY13H180011)
浙江省自然科学基金资助项目(LY15F020021)