摘要
文章以中山大学汉字偏误中介语语料库为例,着重讨论建设相关中介语语料库时应注意的四个问题:语料真实性和连续性问题;语料的科学标注,尤其是汉字偏误标注问题;检索工具的简易使用问题;附属系统问题。文中还结合建设实践提出了一些可资借鉴的方法和建议。
The paper reports the p Interlanguage Corpus of Sun Yat-Sen U some theoretical issues in interlanguage reliminary findings of character error-coded Chinese niversity. The corpus is used as an illustration on corpus building. The first one is the authenticity and continuity of the corpus. The second one is the principled tagging, especially the tagging for the characters errors. The wrong characters are created by Truetype Character Editor in Windows, and stored and displayed as images. The characters can be edited. The third issue is that the retrieval tool should be multifunetional and user-friendly to guarantee the efficient use of corpus data. The last issue is the development of the sub-system of corpora.
出处
《语言文字应用》
CSSCI
北大核心
2012年第2期131-136,共6页
Applied Linguistics
基金
国家社会科学基金青年项目(10cyy020)资助
关键词
汉字偏误标注
中介语语料库
标注
tagging of character errors
interlanguage corpus
annotation