期刊文献+

成本约束下自适应众包标注的用户观点抽取 被引量:3

User opinion extraction based on adaptive crowd labeling with cost constrain
下载PDF
导出
摘要 用户评论包含了丰富的用户观点信息,对潜在的顾客和商家具有重要的参考价值。观点目标和观点词作为用户评论中的核心对象,它们的自动抽取是用户评论智能化应用的一项核心工作。目前主要采用有监督的抽取方法解决该问题,这些方法依赖于利用高质量的标注样本进行模型训练,而传统人工标注样本的方法不仅耗时费力,且标注成本高。众包计算为构建高质量训练样本集提供了一种有效途径,然而,众包工作者由于知识背景等因素使得标注结果的质量参差不齐。为了在有限的成本下获取高质量的标注样本,提出一种基于工作者专业水平评估的自适应众包标注方法,构建可靠的观点目标-观点词数据集。首先,通过小成本挖掘出高专业水平的工作者;然后,设计一种基于工作者可靠性的任务分发机制;最后,利用观点目标和观点词间的依赖关系设计了一种有效的标注结果融合算法,通过整合不同工作者的标注结果生成最终可靠的结果。在真实数据集上进行了一系列实验表明,与GLAD模型和多数投票(MV)算法方法相比,所提方法能够在成本预算较小的情况下将构建出的高质量观点目标-观点词数据集的可靠性提高10%左右。 User reviews contain a wealth of user opinion information which has great reference value to potential customers and merchants. Opinion targets and opinion words are core objects of user reviews, so the automatic extraction of them is a key work for user review intelligent applications. At present, the problem is solved mainly by supervised extraction method, which depends on high quality labeled samples to train the model. And traditional manual labeling method is time-consuming, laborious and costly. Crowdsourcing calculation provides an effective way to build a high-quality training sample set. However, the quality of the labeling results is uneven due to some factors such as knowledge background of the workers. To obtain high-quality labeling samples at a limited cost, an adaptive crowdsourcing labeling method based on professional level evaluation of workers was proposed to construct a reliable dataset of opinion target-opinion words. Firstly, high professional level workers were digged out with small cost. And then, a task distribution mechanism based on worker reliability was designed. Finally, an effective fusion algorithm for labeling results was designed by using the dependency relationship between opinion targets and opinion words, and the final reliable results were generated by integrating the labeling results of different workers. A series of experiments on real datasets show that the reliability of high quality opinion target-opinion word dataset built by the proposed method can be improved by about 10%, compared with GLAD(Generative model of Labels, Abilities, and Difficulties) model and MV(Majority Vote) method when the cost budget is low.
作者 赵威 林煜明 黄涛贻 李优 ZHAO Wei;LIN Yuming;HUANG Taoyi;LI You(Guangxi Key Laboratory of Trusted Software ( Guilin University of Electronic Technology ) , Guilin Guangxi 541004, China;Guangxi Key Laboratory of Automatic Detecting Technology and Instruments ( Guilin University of Electronic Technology ) , Guilin Guangxi 541004, China)
出处 《计算机应用》 CSCD 北大核心 2019年第5期1351-1356,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61562014 U1711263) 广西自然科学基金重点项目(2018GXNSFDA281049) 桂林电子科技大学研究生优秀学位论文培育项目(16YJPYSS15) 桂林电子科技大学研究生教育创新计划项目(2018YJCX48) 广西可信软件重点实验室研究课题(kx201916)~~
关键词 观点挖掘 众包计算 成本约束 工作者检测 数据整合 opinion mining crowdsourcing calculation cost constraint worker measurement data integration
  • 相关文献

参考文献3

二级参考文献103

  • 1HoweJ. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1-4.
  • 2HoweJ. Crowdsourcing. New York: Crown Publishing Group, 2008.
  • 3Zhao Yu-Xiang , Zhu Qing-Hua. Evaluation on crowdsourcing research: Current status and future direction. Information Systems Frontiers, 2012, 11(1): 1-18.
  • 4von Ahn L, Maurer B, Abraham D, Blum M. reCAPTCHA: Human-based character recognition via web security measures. Science, 2008, 321(5895): 1465-1468.
  • 5Ipeirotis P G. Analyzing the amazon mechanical turk marketplace. ACM Crossroads, 2010, 17(2): 16-21.
  • 6Doan A, Franklin MJ, Kossmann D, Kraska T. Crowdsourcing applications and platforms: A data management perspective. Proceedings of the VLDB Endowment, 2011,4(12): 1508-1509.
  • 7Alonso 0, Lease M. Crowdsourcing for information retrieval: Principles, methods, and applications//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China, 2011: 1299-1300.
  • 8Lease M, Alonso O. Crowdsourcing for search evaluation and social-algorithmic search//Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland, USA, 2012: 1180.
  • 9Ipeirotis P G, Paritosh P K. Managing crowdsourced human computation, A tutoriall /Proceedings of the 20th International Conference on World Wide Web. Hyderabad, India, 2011, 287-288.
  • 10Alonso 0, Lease M. Crowdsourcing 101, Putting the WSDM of crowds to work for you//Proceedings of the 4th International Conference on Web Search and Web Data Mining. Hong Kong, China, 2011, 1-2.

共引文献156

同被引文献20

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部