摘要
大规模标注语料库在自然语言处理的语义理解和算法研究等领域有重要作用.本文针对维吾尔语事件标注语料空白以及标注仅仅涉及简单的人类智能的事实,提出了一种基于众包的维吾尔语事件标注方法.在制定了维吾尔语事件标注规范之后,建立了三层架构的标注体系,并提出质量控制机制.维吾尔语事件标注语料库为维吾尔语事件的研究提供了重要的资源支持.
Large scale annotated corpora have played an important role in natural language processing (NLP) research, encountering the development of novel ideas, tasks and algorithm. Confronted with the lack of event tagging corpus in Uyghur language and the fact that corpus annotation only involves a simple human intelligence, this research proposes an event corpus annotation method based on crowdsourcing. At first, the paper formulated the Uygur event tagging specification, then we established a three-layer architecture corpus tagging platform, and then put forward error correction mechanism and quality control strategies to ensure the tagging quality. The establishment of Uygur language event tagging corpus can provide powerful resources for the Uyghur language event researches.
出处
《新疆大学学报(自然科学版)》
CAS
北大核心
2015年第2期209-214,220,共7页
Journal of Xinjiang University(Natural Science Edition)
基金
国家自然科学基金项目(61331011
61262060)
国家重点基础研究发展计划(973)项目(2014cb340506)
关键词
事件
维吾尔语
语料库
众包
Event
Uyghur
Annotation Corpus
Crowdsourcing