摘要
根据中古汉语的基本特点,结合现有语料库的建设经验,阐述中古汉语语料库选取语料的若干原则:语料样本的代表性、文本类型的平衡性、语料之间的关联性与区别度、入库文献的特色性;讨论建立中古汉语语料库分词规范、分词词表的可行性,初步构建"信息处理用中古汉语分词规范"的整体框架。
According to the basic characteristics of Middle Chinese,combined with the experience from existing corpus,the paper researches into several principles of the selection of texts,such as representative samples of texts,the balance of text type,the degree of the correlation and distinction between various kinds of texts,the character of samples selected from all texts.Then,the paper discusses the feasibility of building a word-segmented criterion and a word-segmented corpus of Middle Chinese.A tentative overall framework of'word-segmented criterion for information processing based on middle Chinese'is constructed.
出处
《西南大学学报(社会科学版)》
CSSCI
北大核心
2014年第3期136-142,184,共7页
Journal of Southwest University(Social Sciences Edition)
基金
国家社会科学基金重大项目"汉语史语料库建设研究"(10&ZD117)
项目负责人:董志翘
教育部人文社会科学规划项目"中古近代农业俗词语研究"(10YJA740033)
项目负责人:化振红