期刊文献+

融合越南语语言特征与改进PCFG的越南语短语树库构建 被引量:4

Construct the Vietnamese phrase Treebank by fusion of Vietnamese grammatical features and improved PCFG
下载PDF
导出
摘要 短语树库是自然语言处理的研究和实际应用的重要资源,就越南语而言目前也缺乏这类树库资源,不利于汉越双语信息处理工作.提出一种融合越南语语法特征与改进PCFG(概率上下文无关文法)的越南语短语树库构建方法,能够自动分析出越南语的短语结构树,从而可解决了越南语短语树库的自动构建问题.首先通过分析越南语的语言特征,制定出越南语的语言特征集;然后利用Inside-Outside算法从人工标注的少量越南语短语树获取PCFG模型中的语法规则集;最后将语法特征集作为语法规则集的补充融入PCFG模型,用得到的新模型最终完成越南语短语树库的构建.实验结果表明,新的PCFG模型针对越南语短语树库构建的准确率达到了81.14%,相比传统PCFG模型以及基于最大熵的树库构建方法准确率明显提高了2%~3%. Phrase Treebank is an important resource for Natural Language Processing research and practical application.For Vietnamese,we still lack this kind of Treebank resources,which has made Chinese and Vietnamese bilingual information processing be difficult to carry on.This paper presents a method to construct the Vietnamese phrase Treebank by fusion of Vietnamese grammatical features and improved PCFG(probabilistic context-free grammar)model.We think that it is a necessary resource for the linguistic research in general and for the development of real applications in the area of NLP(Natural Language Processing).This method can automatically analyze Vietnamese phrase structure tree,and it can solve the problem of constructing the Vietnamese phrase Treebank.Firstly,Vietnamese grammatical feature set is established by analysis of Vietnamese grammatical features.Then,grammar rule set of PCFG(probabilistic context-free grammar)model is obtained from manual annotation Vietnamese phrase trees.Atthe same time,The traditional PCFG(probabilistic context-free grammar)model is improved by adding more contextual semantic information,which are Pre co-occurrence probability and Post co-occurrence probability.Finally,Vietnamese grammatical feature set is fused into improved PCFG(probabilistic context-free grammar)model,which is regarded as a supplement.The new method completes the construction of Vietnamese phrase Treebank.The final improved PCFG(probabilistic context-free grammar)model has obtained good results for Vietnamese syntactic analysis.It not only improves the accuracy,but also reduces syntactic parsing time.The process of Vietnamese automatic syntactic analysis also promotes the construction of Vietnamese phrase Treebank.The experimental results show that the accuracy of proposed PCFG(probabilistic context-free grammar)model for the Vietnamese phrase Treebank construction reaches 81.14%.Compared with conventional PCFG(probabilistic context-free grammar)model and the maximum entropy method,the accuracy is obviously improved.
作者 李英 郭剑毅 余正涛 线岩团 陈玮 Li Ying Guo Jianyi Yu Zhengtao Xian Yantuan Chen Wei(The School of Information Engineering and Automation,Kunming University of Science and Technology Kunming, 650500, China The Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming, 650500, China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2017年第2期357-367,共11页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61262041 61363044 61472168)
关键词 越南语 短语结构树 概率上下文无关文法 语法规则集 树库 Vietnamese phrase structure tree probabilistic context free grammar grammatical rule set Treebank
  • 相关文献

参考文献1

二级参考文献28

  • 1李幸,宗成庆.引入标点处理的层次化汉语长句句法分析方法[J].中文信息学报,2006,20(4):8-15. 被引量:22
  • 2毛奇,连乐新,周文翠,袁春风.基于标点符号分割的汉语句法分析算法[J].中文信息学报,2007,21(2):29-34. 被引量:7
  • 3Liu Ting Ma Jinshan Zhang Huipeng Li Sheng.SUBDIVIDING VERBS TO IMPROVE SYNTACTIC PARSING[J].Journal of Electronics(China),2007,24(3):347-352. 被引量:2
  • 4罗强,奚建清.一种结合SVM学习的产生式依存分析方法[J].中文信息学报,2007,21(4):21-26. 被引量:5
  • 5Allen, J. 1995. Natural Language Understanding. 2nd edition. Menlo Park, CA: Benjamin Cummings.
  • 6Bikel, D.M. and D. Chiang. 2000. Two statistical parsing models applied to the Chinese Treebank. Proceedings of the Second Chinese Language Processing Workshop. Pp. 1- 6.
  • 7Chiang, D. and D.M. Bikel. 2002. Recovering latent information in treebanks. Proceedings of the 19th Inter national Conference on Computational Linguistics (COLING). Pp. 183 -9.
  • 8Collins, M. 1997. Three generative, lexicalized models for statistical parsing. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics ( ACL). Pp. 16 -23.
  • 9Collins, M. 2000. Discriminative reranking for natural language parsing. Proceedings of the 17th International Con- ference on Machine Learning ( ICML). Pp. 175-82.
  • 10Earley, J. 1970. An efficient context-free parsing algorithm. Communications of the ACM 13,2:94- 102.

共引文献26

同被引文献7

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部