
短语结构树库向依存结构树库转化研究 被引量:12

A Study on Constituent-to-Dependency Conversion
摘要 汉语依存树库的建设相对其他语言如英语,在规模和质量上还有一些差距。树库标注需要付出很大的人力物力,并且保证树库质量也比较困难。该文尝试通过规则和统计相结合的方法,将宾州汉语短语树库PennChinese Treebank转化为哈工大依存树库HIT-IR-CDT的体系结构,从而增大现有依存树库的规模。将转化后的树库加入HIT-IR-CDT,训练和测试依存句法分析器的性能。实验表明,加入少量经转化后的树库后,依存句法分析器的性能有所提高;但加入大量树库后,性能反而下降。经过细致分析,作为一种利用多种树库提高依存句法分析器性能的方法,短语转依存还存在很多需要深入研究的方面。 The progress of Chinese dependency treebank construction has fallen behind other languages, such as English, in terms of scale and quality. Building a large scale treebank needs a lot of human and material resources. Meanwhile, it is very difficult to guarantee the quality of the treebank. In this paper, we explore a new method which combines rule based method and statistical-based method to convert a constituent treebank named Penn Chinese Treebank to a dependency treebank which follows the annatation standard of HIT Chinese Dependency Treebank (HIT-IR-CDT). We increase the size of training data by adding converted treebank into HIT-IR CDT and retrain the dependency parser. Experiments show that small addition of converted treebank can improve the performance of dependency parser, while large addition will bring it down. Through detailed analysis, we believe that convertion of constituent to dependency treebank still needs in depth research as a method of improving performance of dependency parser by utilizing different treebanks.
出处 《中文信息学报》 CSCD 北大核心 2008年第6期14-19,共6页 Journal of Chinese Information Processing
基金 自然科学基金资助项目(60675034 60575042) 国家863计划资助项目(2006AA01Z145)
关键词 计算机应用 中文信息处理 短语结构树库 依存结构树库 依存句法分析 computer application Chinese information processing constituent-based treebank dependency treebank dependency parsing
  • 相关文献


  • 1马金山.基于统计方法的汉语依存句法分析研究[D].博十毕业论文,哈尔滨工业大学,2007.
  • 2David M. Magerman. Natural language parsing as sta tistical pattern recognition[D]. Ph.D. thesis, Stanford University. 1994.
  • 3Michael J. Collins. Head-driven statistical models for natural language parsing[D]. Ph.D. thesis, University of Pennsylvania, Philadelphia. 1999.
  • 4Hiroyasu Yamada, Yuji Matsumoto. Statistical dependency analysis with support vector machines[C]// Proceedings of 8^th International Workshop on Parsing Technologies. 2003: 195-206.
  • 5Joakim Nivre, Mario Scholz. Deterministic Dependency Parsing of English Text[C]//Proceedings of COLING. 2004: 64-70.
  • 6Richard Johansson, Pierre Nugues. Extended constituent to dependency conversion for English [C]//Proceedings of NODALIDA,2007.. 105-112.
  • 7党政法,周强.短语树到依存树的自动转换研究[J].中文信息学报,2005,19(3):21-27. 被引量:12
  • 8Nianwen Xue, Fei Xia. The Bracketing Guidelines for the Penn Chinese Treebank (3.0)ER]. 2000.
  • 9Ting Liu, Jinshan Ma, Sheng Li. Building a Depend ency Treebank for Improving Chinese Parser[J]. Journal of Chinese Language and Computing, 2006, 16 (4) : 207-224.
  • 10Ryan McDonald. Discriminative learning and spanning tree algorithms for dependency parseing [D]. Ph. D. thesis, University of Pennsylvania, Philadelphia. 2006.


  • 1周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:90
  • 2Lin, D. (1995). A dependency-based method for evaluating broad-coverage parsers[A]. In: Proceedings of IJCAI-95[ C], Montreal, Quebec, Canada.
  • 3Fei Xia and Martha Palmer. 2001. Converting Dependency Structures to Phrase Structures[A]. In: Proceedings of the Human Language Technology Conference (HLT-2001)[C], San Diego, CA, March, 18- 21.
  • 4Zdenek Zabokrtsky, Otakar Smrz: Arabic Syntactic Trees: from Constituency to Dependency[C]. EACL 2003:183- 186.
  • 5Tylman Ule and Sandra Ktibler (2004) : From Phrase Structure to Dependencies, and Back[A]. In: Proceedings of The International Conference on Linguistic Evidence[C], Ttibingen, Germany, January, 2004.
  • 6Gerold Schneider. 1998. A Linguistic Comparison Constituency, Dependency, and Link Grammar[D]. Master's thesis, University of Zurich.
  • 7M. Covington. GB Theory as Dependency Grammar[A], 1994. Research Report AI-1992 - 03.
  • 8Nivre, J. (2003). Theory-supporting treebanks[A]. In: J. Nivre and E. Hinrichs, eds., Proceedings of Treebanks and Linguistic Theories[C].
  • 9Leech, G. ; and Garside, R. (1991). Running a grammar factory: The production of syntactically analysed corpora or ‘treebanks' [A]. In: Stig Johansson and Anna-Brim Stenstrom (eds.) English Computer Corpora : Selected papers and Research Guide. 1991[C]. 15-32.
  • 10M. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a Large Annotated Corpus of English: the Penn Treebank[A]. Computational Lingustics, 1993.












使用帮助 返回顶部