期刊文献+

基于句法树库的藏语短语分类问题新探

A New Study on the Tibetan Phrase Classification Based on Treebank
下载PDF
导出
摘要 基于统计的语料库语言学已经成为语言信息处理研究的主要方法。文章从语言信息处理的角度,以短语结构理论为指导,构建了较为完备的句法树库。首先,从涵盖各种内容和各类体裁的文章中选取了10000个藏语句子进行了词类的划分与标注,并依据短语在句子中的语法功能和语义关系,在借鉴信息处理用藏语词类划分标准与标记规范的基础上,将其分类体系及标准拓展到短语结构分类层面,规范了短语的划分标准,制定了标记方式,构建了比较完整的短语结构句法树库;其次,界定了针对自然语言处理的藏语短语概念,全面、系统地描述了藏语短语的结构和功能。在此基础上,统计抽取了八类短语结构,得到了短语结构统计分布特征;最后,构建了短语结构规则模型,采用Python编程建立了包含49328条藏语短语的结构信息库,运用短语及句法分析的相关理论,通过观察句子中词之间,以及短语之间的语法结构和语义关系,对藏语短语结构进行了句法功能分类和语义属性描述。以期构建一个藏语短语结构标注树库,统计归纳藏语短语结构类型及功能,完成一个以句法功能为主,兼顾语义特征的藏语短语结构库。该库对于藏语句法分析及机器翻译等理论与技术研究具有重要的参考价值。 The goal of this study is to construct a tagging tree of Tibetan phrase structure,and summarize the structure types and functions of Tibetan phrase,as well complete a Tibetan phrase structure database with syntactic functions as the main point with semantic features.Statistical corpus linguistics has become the main method in the field of language information processing.In this research,a nearly complete Treebank is established from the perspective of language information processing and guided by phrase structure theory.Besides,the combination form and structure of Tibetan phrases are described in a formalized way,and the syntactic information of various phrase structures are also summarized in details.Firstly,amount of 10000 Tibetan sentences are selected from different articles covering various contents and genres to classify and annotate the parts-of-speech and tagging.Moreover,according to the grammatical function and semantic relationship of phrases in sentences,and combined with the basis of drawing lessons from the standard of parts-of-speech and tagging set for Tibetan information processing,the classification system and standard of phrases are extended to the level of phrase structure classification,the criterion of phrase division is standardized,the marking method is formulated.Secondly,the concept of Tibetan phrases for natural language processing is defined based on the relevant theories and methods of modern linguistics and computational linguistics,referring to the existing research results of other national languages such as English and Chinese,and combining with the traditional grammar theory of Tibetan.The structure and function of Tibetan phrases are described comprehensively and systematically.Finally,the rule model of phrase structure is constructed,and the structure information database of 49328 Tibetan phrases is established by Python programming.By observing the grammatical structure and semantic relationship between words and phrases in sentences,the Tibetan phrase structure is classified and described by the theory of phrase and syntactic analysis.
作者 仁青卓么 Rin-Chen-sGrol-Ma(School of Computer Schience,Qinghai Normal University,Xining,810016)
出处 《青海师范大学学报(藏文版)》 2021年第4期74-83,共10页 Journal of Qinghai Normal University (Tibetan language)
基金 国家自然基金项目“基于深度学习的藏语短语结构标注树库构建技术研究”(62007019) 青海省应用基础研究计划项目“藏文电子文本自动校对研究”(2021-ZJ-727) 青海师范大学中青年科研基金项目“藏文音节组字部件识别及其知识库建设”(2019zr013)。
关键词 树库 藏语短语 分类 treebank phrase classification
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部