A novel dynamic batch selective sampling algorithm based on version space analysis is presented. In the traditional batch selective sampling, example selection is entirely determined by the existing unreliable classif...A novel dynamic batch selective sampling algorithm based on version space analysis is presented. In the traditional batch selective sampling, example selection is entirely determined by the existing unreliable classification boundary; meanwhile, within a batch, examples labeled previously fail to provide instructive information for the selection of the rest. As a result, using the examples selected in batch mode for model refinement will jeopardize the classification performance. Based on the duality between feature space and parameter space under the SVM active learning fi:amework, dynamic batch selective sampling is proposed to address the problem. We select a batch of examples dynamically, using the examples labeled previously as guidance for further selection. In this way, the selection of feedback examples is determined by both the existing classification model and the examples labeled previously. Encouraging experimental results demonstrate the effectiveness of the proposed algorithm.展开更多
通过选取并提交专家标注最有信息量的样例,主动学习算法中可以有效地减轻标注大量未标注样例的负担.采样是主动学习算法中一个影响性能的关键因素.当前主流的采样算法往往考虑选取的样例尽可能平分版本空间.但这一方法假定版本空间中的...通过选取并提交专家标注最有信息量的样例,主动学习算法中可以有效地减轻标注大量未标注样例的负担.采样是主动学习算法中一个影响性能的关键因素.当前主流的采样算法往往考虑选取的样例尽可能平分版本空间.但这一方法假定版本空间中的每一假设都具有相同的概率成为目标函数,而这在真实世界问题中不可能满足.分析了平分版本策略的局限性.进而提出一种旨在尽可能最大限度减小版本空间的启发式采样算法MPWPS(the most possibly wrong-predicted sampling),该算法每次采样时选取当前分类器最有可能预测错误的样例,从而淘汰版本空间中多于半数的假设.这种方法使分类器在达到相同的分类正确率时,采样次数比当前主流的针对平分版本空间的主动学习算法采样次数更少.实验表明,在大多数数据集上,当达到相同的目标正确率时,MPWPS方法能够比传统的采样算法采样次数更少.展开更多
多类分类是机器学习领域中的重要问题.目前普遍采用的多类分类方法:"one versus all"(OvA)直接利用"标准"的两类分类器重复构造两类分类器,导致计算复杂度较高、分类效率降低.基于支持向量机的多类分类器尽管无需...多类分类是机器学习领域中的重要问题.目前普遍采用的多类分类方法:"one versus all"(OvA)直接利用"标准"的两类分类器重复构造两类分类器,导致计算复杂度较高、分类效率降低.基于支持向量机的多类分类器尽管无需重复构造两类分类器,但由于它对应于版本空间(version space)内最大超球的中心,所以当版本空间为非对称或比较狭长时,它的泛化能力显著降低.而基于版本空间解析中心的多类分类算法M-ACM克服了上述问题.从理论上分析了该分类器的泛化性能,给出了它的泛化误差上界,并进行了实验验证.展开更多
文摘A novel dynamic batch selective sampling algorithm based on version space analysis is presented. In the traditional batch selective sampling, example selection is entirely determined by the existing unreliable classification boundary; meanwhile, within a batch, examples labeled previously fail to provide instructive information for the selection of the rest. As a result, using the examples selected in batch mode for model refinement will jeopardize the classification performance. Based on the duality between feature space and parameter space under the SVM active learning fi:amework, dynamic batch selective sampling is proposed to address the problem. We select a batch of examples dynamically, using the examples labeled previously as guidance for further selection. In this way, the selection of feedback examples is determined by both the existing classification model and the examples labeled previously. Encouraging experimental results demonstrate the effectiveness of the proposed algorithm.
文摘通过选取并提交专家标注最有信息量的样例,主动学习算法中可以有效地减轻标注大量未标注样例的负担.采样是主动学习算法中一个影响性能的关键因素.当前主流的采样算法往往考虑选取的样例尽可能平分版本空间.但这一方法假定版本空间中的每一假设都具有相同的概率成为目标函数,而这在真实世界问题中不可能满足.分析了平分版本策略的局限性.进而提出一种旨在尽可能最大限度减小版本空间的启发式采样算法MPWPS(the most possibly wrong-predicted sampling),该算法每次采样时选取当前分类器最有可能预测错误的样例,从而淘汰版本空间中多于半数的假设.这种方法使分类器在达到相同的分类正确率时,采样次数比当前主流的针对平分版本空间的主动学习算法采样次数更少.实验表明,在大多数数据集上,当达到相同的目标正确率时,MPWPS方法能够比传统的采样算法采样次数更少.
文摘多类分类是机器学习领域中的重要问题.目前普遍采用的多类分类方法:"one versus all"(OvA)直接利用"标准"的两类分类器重复构造两类分类器,导致计算复杂度较高、分类效率降低.基于支持向量机的多类分类器尽管无需重复构造两类分类器,但由于它对应于版本空间(version space)内最大超球的中心,所以当版本空间为非对称或比较狭长时,它的泛化能力显著降低.而基于版本空间解析中心的多类分类算法M-ACM克服了上述问题.从理论上分析了该分类器的泛化性能,给出了它的泛化误差上界,并进行了实验验证.