摘要
《中国民族语言语法标注文本》丛书是国内第一套大规模真实文本资源,涵盖十余种低资源中国民族语言,又具有语法标注精深学术价值,因此引起学界广泛的兴趣和关注。鉴于该套丛书大规模标注文检索技术实现的重要价值,本文通过介绍该项目的内容、技术实现过程和可期的检索功能,特别对国际通行隔行对照化对齐文本的实现技术加以详释,使读者在项目上线之前就对丛书电子化和检索技术的实现有客观而清晰的认识。
The series of Grammatically Labeled Texts of Ethnic Languages in China is the first set of large-scale authentic text resources in China,covering more than ten low-resource ethnic languages in China with much academic value of grammatical labeling,and thus arousing wide interest and concern in the academic circles.In view of its great value of the successful implementation of retrieval technology for large-scale annotated texts,this paper summarizes the content of the project,the process of the technical realization and the expected searching functions,and especially expounds the implementation technology for the internationally popular interlinearized texts so that readers can clearly understand the implementation of electronic and retrieval technology for book series before the project goes on line.
作者
江荻
龙从军
JIANG Di;LONG Congjun(Center for Studies of Chinese and Sino-Tibetan Languages,Jiangsu Normal University,Xuzhou 221116,China;Institute of Ethnology and Anthropology,Chinese Academy of Social Sciences,Beijing 100081,China;School of Chinese Language and Literature,University of Chinese Academy of Siocal Science,Beijing 100081,China)
出处
《云南师范大学学报(哲学社会科学版)》
北大核心
2023年第6期36-44,共9页
Journal of Yunnan Normal University:Humanities and Social Sciences Edition
基金
国家社会科学基金重大项目“中国民族语言大规模语法标注文本在线检索系统研制与建设研究”(21&ZD304)。
关键词
民族语
标注文本
语料数据
检索技术
ethnic languages
annotated texts
corpus data
retrieval technology