摘要
针对众包数据处理中的质量控制问题,提出了一种加权K近邻投票分类方法。该方法不单单只是考虑了某个样例的标记来返回一个答案,而是通过综合考虑样例的近邻来得到更加准确的答案。同时对样例的近邻加以适当的权重来进一步提高算法的性能,并保持了传统多数投票分类的简单性。K近邻投票分类算法可以有效地解决缺乏标记的情况,通过对近邻加以权重可以解决不平衡标记造成的影响,从而使算法的泛化性更强。通过各种场景下的实验,结果表明加权K近邻投票分类方法取得了很好的效果。
Aiming at the quality control problem in crowdsourcing data processing,this paper proposed a weighted K-nearest neighbor voting method.This method not only considered the mark of a certain sample to return an answer,but rather obtained a more accurate answer by considering the neighbors of the sample comprehensively.At the same time,it applied appropriate weights to the neighbors of the sample to further improve the performance of the algorithm and maintained the simplicity of the traditional majority vote.The K-nearest neighbor vote can effectively solve the problem of lack of markup.By weighting the neighbors,it can solve the influence of the unbalanced mark and made the generalization of the algorithm be stronger.Through experiments in various situations,the results show that the proposed weighted K-nearest neighbor voting method has achieved good results.
作者
李佳烨
余浩
Li Jiaye;Yu Hao(Guangxi Key Laboratory of Multi-source Information Mining&Security,Guangxi Normal University,Guilin Guangxi 541004,China;School of Computer Science&Engineering,Central South University,Changsha 410083,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第4期973-976,共4页
Application Research of Computers
基金
国家重点研发计划资助项目(2016YFB1000905)
国家自然科学基金资助项目(61170131,61263035,61573270,90718020)
国家“973”计划资助项目(2013CB329404)
中国博士后科学基金资助项目(2015M570837)
广西自然科学基金资助项目(2015GXNSFCB139011,2015GXNSFAA139306)。
关键词
众包数据
质量控制
K近邻投票
多数投票
crowdsourcing data
quality control
K-nearest neighbor voting
majority voting