摘要
文章一方面叙述西藏的多样性语言现象,另一方面尝试用一种计算机自动聚类方法对西藏的语言进行分类。文章采用编辑距离计算方法的编码和赋值,以及聚类算法和树形图呈现。实验采集了西藏的藏语及其方言,还有门巴、珞巴、僜人等族群的非藏语语言共计49种,包括目前尚未识别的一些地方话。实验结果表明:西藏地区分布着藏语和非藏语的藏缅语,藏语包括传统分类的卫藏和康方言,卫藏又可分为前藏、后藏、阿里和南部次方言;非藏语的藏缅语有门巴族语言、珞巴族语言、义都系语言,以及藏东和藏东南语言系属不明的地方话。与传统分类相比,文章中所用机器语言或方言自动分类相当合理,可信度很高。
This article describes the linguistic phenomenon of diversity in Tibet and attempts to classify languages in Tibet through an automatic computer clustering method,using the coding and assignment method of editing distance calculation,as well as the clustering algorithm and tree graphic presentation.This paper collects the Tibetan language and dialects in Tibet,as well as the non-Tibetan languages of Monpa,Lhoba,Deng and other ethnic groups,including some local languages that are not yet identified,with a total of 49categories.The experimental results are as follows:there are Tibetan and non-Tibetan(Tibeto-Burman)languages in Tibet Autonomous Region.Tibetan language includes the traditionally classified dialects,namely,the dBus-gtsang and Khams dialects,and the dBus-gtsang dialect can be subdivided into dBus,gTsang,mNgav-ris and southern subdialects.The Tibeto-Burman language includes the Monpa and Lhoba languages,the Yidu-family languages,and the local languages of unknown origin in eastern and south-eastern Tibet.Compared with the traditional classification,the machine-automated classification in this paper is quite refine and reasonable,and the reliability is very high.
出处
《中国藏学》
CSSCI
北大核心
2022年第6期150-160,219,共12页
China Tibetology
基金
国家社科基金重大项目“中国民族语言大规模语法标注文本在线检索系统研制与建设研究”(项目编号:21&ZD304)阶段性成果
关键词
西藏
语言多样性
编辑距离
自动分类
Tibet
Linguistic diversity
Edit distance
Automatic classification