摘要
该文从研究背景、设计思路、标注体系和方法、加工步骤等方面介绍了汉语语义倾向语料库的建设过程。该语料库是一个以研究语言主观性表达为目的的共时、非平衡、单语标注语料库,依据语言主观性多维度描述体系而设计,规模为100万字,配备有集检索与统计、结果检查与可视化于一体的专用语料库工具箱系统,具有可用性大、标注质量高、语言学理据强等特点。
This paper introduces the construction of a Chinese Semantic Orientation Corpus (CSOC) by presenting its research background, design plan, annotating system and processing steps. The CSOC is an unbalanced synchronic monolingual corpus for the purpose of researching linguistic subjective expressions. Shipped with a concordancer, retrievial and visualization toolkit, the one million Chinese character corpus is specially designed according to a multidimensional descriptive system of linguistic subjectivity. It is characterized by its high-quality, linguistic motivation and double usability for both linguistics and natural language processing.
出处
《中文信息学报》
CSCD
北大核心
2014年第5期74-82,共9页
Journal of Chinese Information Processing
基金
教育部人文社会科学研究项目(11YJC740127)
湖南省教育厅科学研究优秀青年项目(14B068)
关键词
语义倾向
语料库
主观性
建设
semantic orientation
corpus
subjectivity
construction