Context-Sensitive Task(CST)is a complex task type in crowdsourc-ing,such as handwriting recognition,route plan,and audio transcription.The current result inference algorithms can perform well in simple crowd-sourcing ...Context-Sensitive Task(CST)is a complex task type in crowdsourc-ing,such as handwriting recognition,route plan,and audio transcription.The current result inference algorithms can perform well in simple crowd-sourcing tasks,but cannot obtain high-quality inference results for CSTs.The conventional method to solve CSTs is to divide a CST into multiple independent simple subtasks for crowdsourcing,but this method ignores the context correlation among subtasks and reduces the quality of result inference.To solve this problem,we propose a result inference algorithm based on the Partially ordered set and Tree augmented naive Bayes Infer(P&T-Inf)for CSTs.Firstly,we screen the candidate results of context-sensitive tasks based on the partially ordered set.If there are parallel candidate sets,the conditional mutual information among subtasks containing context infor-mation in external knowledge(such as Google n-gram corpus,American Contemporary English corpus,etc.)will be calculated.Combined with the tree augmented naive(TAN)Bayes model,the maximum weighted spanning tree is used to model the dependencies among subtasks in each CST.We collect two crowdsourcing datasets of handwriting recognition tasks and audio transcription tasks from the real crowdsourcing platform.The experimental results show that our approach improves the quality of result inference in CSTs and reduces the time cost compared with the latest methods.展开更多
In traditional crowdsourcing, workers are expected to provide independent answers to tasks so as to ensure the diversity of answers. However, recent studies show that the crowd is not a collection of independent worke...In traditional crowdsourcing, workers are expected to provide independent answers to tasks so as to ensure the diversity of answers. However, recent studies show that the crowd is not a collection of independent workers, but instead that workers communicate and collaborate with each other. To pursue more rewards with little effort, some workers may collude to provide repeated answers, which will damage the quality of the aggregated results. Nonetheless, there are few efforts considering the negative impact of collusion on result inference in crowdsourcing. In this paper, we are specially concerned with the Collusion-Proof result inference problem for general crowdsourcing tasks in public platforms. To that end, we design a metric, the worker performance change rate, to identify the colluded answers by computing the difference of the mean worker performance before and after removing the repeated answers. Then we incorporate the collusion detection result into existing result inference methods to guarantee the quality of the aggregated results even with the occurrence of collusion behaviors. With real-world and synthetic datasets, we conducted an extensive set of evaluations of our approach. The experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.展开更多
基金supported by the National Social Science Fund of China(Grant No.22BTQ033).
文摘Context-Sensitive Task(CST)is a complex task type in crowdsourc-ing,such as handwriting recognition,route plan,and audio transcription.The current result inference algorithms can perform well in simple crowd-sourcing tasks,but cannot obtain high-quality inference results for CSTs.The conventional method to solve CSTs is to divide a CST into multiple independent simple subtasks for crowdsourcing,but this method ignores the context correlation among subtasks and reduces the quality of result inference.To solve this problem,we propose a result inference algorithm based on the Partially ordered set and Tree augmented naive Bayes Infer(P&T-Inf)for CSTs.Firstly,we screen the candidate results of context-sensitive tasks based on the partially ordered set.If there are parallel candidate sets,the conditional mutual information among subtasks containing context infor-mation in external knowledge(such as Google n-gram corpus,American Contemporary English corpus,etc.)will be calculated.Combined with the tree augmented naive(TAN)Bayes model,the maximum weighted spanning tree is used to model the dependencies among subtasks in each CST.We collect two crowdsourcing datasets of handwriting recognition tasks and audio transcription tasks from the real crowdsourcing platform.The experimental results show that our approach improves the quality of result inference in CSTs and reduces the time cost compared with the latest methods.
基金This work was supported partly by the National Basic Research 973 Program of China under Grant Nos. 2015CB358700 and 2014CB340304, the National Natural Science Foundation of China under Grant No. 61421003, and the Open Fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2017ZX-14.
文摘In traditional crowdsourcing, workers are expected to provide independent answers to tasks so as to ensure the diversity of answers. However, recent studies show that the crowd is not a collection of independent workers, but instead that workers communicate and collaborate with each other. To pursue more rewards with little effort, some workers may collude to provide repeated answers, which will damage the quality of the aggregated results. Nonetheless, there are few efforts considering the negative impact of collusion on result inference in crowdsourcing. In this paper, we are specially concerned with the Collusion-Proof result inference problem for general crowdsourcing tasks in public platforms. To that end, we design a metric, the worker performance change rate, to identify the colluded answers by computing the difference of the mean worker performance before and after removing the repeated answers. Then we incorporate the collusion detection result into existing result inference methods to guarantee the quality of the aggregated results even with the occurrence of collusion behaviors. With real-world and synthetic datasets, we conducted an extensive set of evaluations of our approach. The experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.