摘要
语料库标注处理是语料库语言学研究中的一个重点和难题。除了词性标注,其他各类标注(包括语义、句法、话语、语用等标注)均较难实现批量或自动化生成。本文简述PowerGREP的检索、编辑与替换、采集三大功能以及与语料库加工密切相关的正则表达式知识,并以批量删除、添加和修改英国国家语料库(BNC)中语料赋码为例,说明如何运用PowerGREP对语料库进行自动化或半自动化加工和处理。
Corpus annotation is an important but difficult component in doing corpus linguistics research. Presently all types of annotations but the annotation of word class cannot be allocated automatically or semi - automatically. This article describes the three main functions of PowerGREP: concordance, substitution and collection. Then, it gives a brief introduction to regular expressions closely relevant to corpus processing, and finally presents how PowerGREP is operated and applied in corpus processing, holding up the data in BNC as an example.
出处
《外语电化教学》
CSSCI
2010年第3期57-62,共6页
Technology Enhanced Foreign Language Education