摘要
【目的】解决众包竞赛中参与者识别体系规模较大、识别方法比较单一的问题。【方法】在广泛收集众包竞赛参与者众包能力指标的基础上,提出递归启发式属性约简方法,构建新的众包参与者识别体系,并在此基础上利用随机森林算法构建众包参与者识别模型。【结果】实验结果表明:所提出的属性约简方法能有效地降低数据维度,由初始的17个属性降低到8个属性;同时,基于约简后的8个属性构建的识别体系和随机森林的众包参与者识别模型具有更高的识别准确率。【局限】识别模型比较简单,有待进一步扩展;数据来源于国内众包竞赛网站,数据真实性有待验证。【结论】将机器学习方法引入到众包竞赛参与者识别中,丰富了参与者识别的方法,提高了识别的效率。
[Objective] This study tries to address the classic issues facing crowd participant identification tasks. [Methods] We proposed a recursive heuristic method to reduce the attributes, aiming to establish a new crowd participant identification system based on their abilities. Then, we built a model to locate crowd participants with the help of random forests algorithm and the proposed system. [Results] Our new method reduced the data dimension to 8 from 18, which yielded better recognition rates. [Limitations] The proposed model is simple and needs to be expanded. Data of this study was retrieved from crowdsourcing contest websites, which might have data integrity issues. [Conclusions] The modified machine learning method could held us effectively identify crowdsourcing participants.
作者
周成
魏红芹
Zhou Cheng;Wei Hongqin(Glorious Sun School of Business and Management,Donghua University,Shanghai 200051,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2018年第7期46-54,共9页
Data Analysis and Knowledge Discovery
基金
东华大学人文社会科学繁荣基金项目"互联网个性化定制用户需求多粒度模型研究"(项目编号:108-10-0108076)的研究成果之一
关键词
众包参与者识别体系
属性约简
随机森林
众包竞赛
Crowd Participant Identification System
Feature Reduction
Random Forests
Crowdsourcing Contests