摘要
本文提出了一个汉语盲文语料库的建设方案。由于现行盲文在分词连写和标调方面的特点,汉语盲文研究难以直接使用明眼文语料库,需要建设专门的语料库。论文拟建设的语料库是一个大规模的(约1000万方盲文)、经过语言信息和触觉信息多层级对照标注的语料库。该语料库的建设可促进对我国盲文发展全貌的把握和了解,促进盲文基础研究和信息化、规范化研究,助力盲人信息无障碍水平提升。论文从语料库选材原则和样本采集、语料标注规范和标注方案、辅助软件研发计划等几个方面详细说明了盲文语料库建设的主要内容和初步方案,并进一步指出其重点和难点问题。
The paper presents a design for the construction of a Chinese Braille corpus, which is needed because Chinese Braille has distinctive features in terms of the writing of particles and of intonations, different from normal Chinese characters. The corpus designed in this paper is of about 10 million characters, and it is tagged with linguistic and touching information. The corpus is of both theoretical and practical significance in deepening the understanding of Chinese Braille, promoting fundamentalresearch, information research and standardization research, and improving the accessibility of the Braille. The paper explains the construction of the corpus in terms of material selection and collection, corpus annotation and assistant software development, and points out important and difficult issues in the construction process.
出处
《语言文字应用》
CSSCI
北大核心
2015年第3期109-118,共10页
Applied Linguistics
基金
国家社科基金重大项目"汉语盲文语料库建设研究"(编号:13&ZD187)
国家语委科研项目"基于云计算平台的语言资源整合应用方略研究"(编号:YB125-38)资助
关键词
汉语盲文
盲文语料库
语料选材
语料标注
Chinese braille
braille corpus
material selection
corpus annotation