摘要
针对藏文情感分析的要求,建立藏文情感语料库。建库主要分三大步骤,爬取原始语料、开发标注平台、建立结构化语料。在标注体系上,糅合并参考英文和中文中相对优秀的情感语料库的标注体系的优点,结合藏文情感文本的特点,建立藏文情感语料标注规范。实验表明,该语料库具有扩展性和实用性,在该标注平台上标注藏语词句能减轻标注人员工作量,同时有效建立结构化语料,满足情感分析需求。
A Tibetan emotional corpus was established for the requirements of Tibetan sentiment analysis. There are three main steps in building a database, crawling the original corpus, developing an annotation platform, and establishing a structured corpus.On the labeling system, combines the advantages of the labeling system of the relatively good emotional corpus in English and Chinese, and combines the characteristics of Tibetan emotional text to establish the Tibetan emotional corpus labeling specification.Experiments show that the corpus is extensible and practical, and labeling Tibetan words and phrases on the labeling platform can reduce the workload of the labeling staff, and effectively establish structured corpus to meet the needs of sentiment analysis.
作者
杨欣
群诺
郭龙银
孟姚媛
Yang Xin;Qun Nuo;Guo Longyin;Meng Yaoyuan(School of Information Science and Technology, Tibet University, Lhasa, Tibet 850000, China)
出处
《计算机时代》
2019年第9期5-7,12,共4页
Computer Era
基金
2018年大学生创新创业训练计划项目“藏文情感语料库的建立与分析”(2018XCX046)
关键词
藏文
情感语料库
标注平台
情感标注
Tibetan
emotional corpus
labeling platform
sentiment tagging