摘要
依存句法分析是自然语言处理的一个关键环节,目前对于越南语短语结构树的研究比较多,而依存结构树的研究就显得十分薄弱。提出了一种新的方法,尝试结合越南语的语言特点和语法特征,利用中心子节点过滤表的思想与统计的方法将越南语的短语结构树转换成依存结构树。首先依据中文依存关系标注体系与越南语的语法规则,制定出依存关系列表;然后结合越南语的语言特点,制定出中心子节点过滤表,利用中心子节点过滤表的思想进行初步转化;最后使用依存关系标注器来进行依存关系标注。基于转换后得到的依存结构树,利用MSTParser工具进一步训练得到更多的越南语依存结构树。对实验结果进行了抽样评估,树库转换的准确率达到了89.4%,较好地解决了越南语由短语树到依存树的转换问题。
Dependency parsing is a key part of the natural language processing.Currently,there are some researcheson Vietnamese phrase structure trees,but few on dependency structure treebank.This paper proposes a novel method,which combines the Vietnamese language features and grammatical features,uses the head percolation table as wellas statistical machining learning method to convert the Vietnamese phrase structure treebank into a dependency one.Firstly,according to Chinese dependency annotation system and Vietnamese grammar rules,a list of dependenciesare developed;Secondly,integrating the characteristics of Vietnamese language,the head percolation table isworked out;Thirdly,using the head percolation table to carry out preliminary conversion;Finally,using dependencytagger to tag dependency.Vietnamese dependency structure treebank increases by training converted treebank withMSTParser tool.The precision of conversion reaches89.4%.The experimental results show that the proposed methodgives a better solution of converting constituent-to-dependency treebank for Vietnamese.
作者
李英
郭剑毅
余正涛
毛存礼
线岩团
LI Ying;GUO Jianyi;YU Zhengtao;MAO Cunli;XIAN Yantuan(School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, China)
出处
《计算机科学与探索》
CSCD
北大核心
2017年第4期599-607,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Nos.61262041
61363044
61472168
云南省自然科学基金重点项目No.2013FA030~~
关键词
句法分析
中心子节点过滤表
短语结构
依存结构
树库
syntactic analysis
head percolation table
phrase structure
dependency structure
treebank