期刊文献+

我国民族语言数据和语言知识服务的理念及实现途径

Concepts and Realization of National Language Data and Language Knowledge Service in China
下载PDF
导出
摘要 民族语言数据和语言知识服务在人文社会研究、民族传统科技、文化保护传承以及中华文化基因探索方面发挥着重要作用。本文以民族语言数据与知识服务为出发点,构建了面向民族语言文化研究的专业数据资源和系列知识库。利用数字人文技术把民族语言学界的重要文献数据进行数字化处理,利用知识图谱技术把各领域知识关联起来,形成文献检索和知识服务平台,按照民族语言词典类、语言简志类、濒危语言类、语法标注类、参考语法类、论文类和其他类收录数据,文献数据库收入文献150多部,关联各民族语言中的语法范畴概念200多个,并对格范畴知识关联结果进行了分析。初步研究发现,民族语言数据的准确性、一致性和规范性值得关注;我国民族语言类型十分丰富,语言的多样性承载了文化的多样性,语言知识的关联性揭示了各民族语言文化之间的共性和差异,启发研究者对民族语言间的亲属关系和文化互鉴进行思考和探索。 Before the advent of the era of large language models,ethnolinguists mainly obtained relevant research data by manually searching for various works that recorded ethnolinguistic words,and sentences.In the process from data acquisition to data collection,there are often many practical problems such as high difficulty in data collection,incomplete information acquisition,scattered data distribution,and no system.Nowadays,the processing of language data has entered the era of large models,and the above problems can be effectively solved by collecting,sorting,and saving data and systematizing the data into the database.However,due to the relative shortage of ethnic language resources,the access channels are not smooth,and the interpretation and analysis of relevant ethnic language data require strong language knowledgeability.As a result,the effect of a large model processing ethnic language data is not ideal.Up to now,there is no publicly used large-scale professional database of Chinese ethnic languages in academia.To make effective use of the advantages of large model processing of ethnic languages and solve various problems faced by large model processing of ethnic languages,actively building ethnic language data and language knowledge services with human-computer collaboration as the core should be an effective measure to carry out ethnic language research and inheritance in the Internet era.Ethnic language data and language knowledge services play an important role in humanities and social research,ethnic traditional science and technology,cultural protection and inheritance,and the exploration of Chinese cultural genes.Based on ethnic language data and knowledge service,this paper constructs professional data resources and a series of knowledge bases for ethnic language and culture research.Digital humanities technology is used to digitize important literature data in the field of ethnolinguistics,and knowledge graph technology is used to associate domain knowledge to form a literature retrieval and knowledge service platform.According to the national language dictionaries,language compendium,endangered languages,grammatical annotations,reference grammar,thesis,and other categories,more than 150 studies were collected from the literature database,more than 200 grammatical category concepts in different national languages were associated,and the results of case category knowledge association were analyzed.Combining the ethnic language data with the support of digital humanities technology can effectively solve the problems faced by ethnolinguists in collecting materials and analyzing corpus.The mining,mathematical statistics,analysis,and calculation of relevant data can facilitate the ethnolinguists to accurately and comprehensively grasp the combination and aggregation relations of phonetics and grammar within a certain ethnolinguistic system and the differences between cross-ethnolinguistic systems.Visualization of data analysis results can also effectively help ethnolinguists to systematically form regular cognition on phonetics,grammar,vocabulary,and other levels among one or more ethnic languages,and promote the in-depth development of related ethnolinguistic research fields.This paper proposes only a professional data and linguistic knowledge service model for ethnolinguistics research,in order to provide a reference for the implementation of the construction of ethnic language data and linguistic knowledge base.At the same time,we still find that the accuracy,consistency,and standardization of ethnic language data are worthy of attention;there are very rich types of ethnic languages in China,and the diversity of languages carries the diversity of cultures.The relevance of language knowledge reveals the commonalities and differences among ethnic languages and cultures and inspires researchers to think and explore the kinship of ethnic languages and cultural mutual learning.
作者 龙从军 LONG Congjun
出处 《暨南学报(哲学社会科学版)》 北大核心 2024年第6期15-30,共16页 Jinan Journal(Philosophy and Social Sciences)
基金 国家社会科学基金重大项目“中国民族语言大规模语法标注文本在线检索系统研制与建设研究”(21&ZD304) 中国社会科学院实验室孵化专项资助项目“基于民族语言多模态数据的共性特征计算研究”(2024SYFH008)。
关键词 语言数据 语言知识 格范畴 知识服务 数字人文 language data language knowledge case category knowledge service Digital humanities
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部