期刊文献+

向上学习方法改进移进-归约中文句法分析 被引量:2

Improving Shift-Reduce Chinese Parsing with an Uptraining Approach
下载PDF
导出
摘要 基于移进-归约的句法分析系统具有线性的时间复杂度,因此在大规模句法分析任务中具有特别实际的意义。然而目前移进-归约句法分析系统的性能远低于领域内最好的句法分析器,例如,伯克利句法分析器。该文研究如何利用向上学习和无标注数据改进移进-归约句法分析系统,使之尽可能接近伯克利句法分析器的性能。我们首先应用伯克利句法分析器对大规模的无标注数据进行自动分析,然后利用得到的自动标注数据作为额外的训练数据改进词性标注系统和移进-归约句法分析器。实验结果表明,向上学习方法和无标注数据使移进-归约句法分析的性能提高了2.3%,达到82.4%。这个性能与伯克利句法分析器的性能可比。与此同时,该文最终得到的句法分析系统拥有明显的速度优势(7倍速度于伯克利句法分析器)。 In practical applications such as parsing the Web, the shift-reduce parser is often preferred due to its linear time complexity. To be further comparable to the state-of-the-art parsers publicly available, this paper adopts the uptraining approach to improve the performance of the shift-reduce parser. The basic idea of uptraining is to apply a high-accuracy parser (such as the Berkeley parser used in this paper) to automatically analyze unlabeled data and then the new labeled data is applied as additional training data to build a POS tagger and the shift-reduce parser. Ex- perimental results on Penn Chinese Treebank show that the approach can improve the shift-reduce parsing to 82.4% (with an absolute improvement of 2.3%), which is comparable to the Berkley parser on the same data and outperforms other state-of-the-art parsers.
出处 《中文信息学报》 CSCD 北大核心 2015年第2期33-39,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金(61073140 61100089) 中央高校基本科研业务费专项资金(N110404012) 高等学校博士学科点专项科研基金(20100042110031)
关键词 中文句法分析 移进-归约分析 伯克利句法分析器 向上学习 无标注数据 Chinese syntactic parsing shift-reduce parsing Berkeley parser uptraining unlabeled data
  • 相关文献

参考文献18

  • 1Slav P,Dan K. Improved inference for unlexicalizedparsing[C]//Proceedings of Huamn Language Tech-nology Conference of the North American Chapter ofthe Association of Computational Linguistics,2007 :404-411.
  • 2Michael C, Head-driven statistical models for naturallanguage parsing [D]. Ph. D. Thesis. University ofPennsylvania, 1999.
  • 3Eugune C. Maximum-entropy-inspired parser [ C]//Proceedings of the 1st Meeting of the North AmericanChapter of the Association for Computational Linguis-tics, 2QOO: 132-139.
  • 4何亮,戴新宇,周俊生,陈家骏.中心词驱动的汉语统计句法分析模型的改进[J].中文信息学报,2008,22(4):3-9. 被引量:3
  • 5冀铁亮,穗志方.词汇化句法分析与子语类框架获取的互动方法[J].中文信息学报,2007,21(1):120-126. 被引量:3
  • 6Kenji S,Alon L. A classifier-based parser with linearrun-time complexity[C]//Proceedings of the 9th Inter-national Workshop on Parsing Technologies* 2005 :125-132.
  • 7Zhang Y,Stephen C. Transition-based parsing of theChinese Treebank using a global discriminative model[C]//Proceedings of the 11th International Workshopon Parsing Technologies,. 2009 : 162171.
  • 8马骥,朱慕华,肖桐,朱靖波.面向移进—归约句法分析器的单模型系统整合算法[J].中文信息学报,2012,26(3):9-15. 被引量:5
  • 9Jun H, Takuya M, Yusuke M,et al. Incrementaljoint POS tagging and dependency parsing in Chinese[C]//Proceedings of the 5th International Joint Con-ference on Natural Language Processing,2011 : 1216-1224.
  • 10Slav P, Pi-Chuan Chang, Michael R, Hiyan A. Up-training for accurate deterministic question parsing[C]//Proceedings of the 2010 Conference on Empiri-cal Methods in Natural Language Processing, 2010:705-713.

二级参考文献43

  • 1Yoav Freund,Robert Schapire.BoosTexter:ABoosting-based for Text Categorization[C] Proceedingsof Machine Learning.2000.39:135-168.
  • 2Andrew Borthwick,John Sterling,Eugene Agichtein,et al.Exploiting Diverse Knowledge Sources viaMaximum Entropy in Named Entity Recognition[C] //Proceedings of the Six Workshop on Very LargeCorpora,1998:152-160.
  • 3Evgeny Matusov,Nicola Ueffing,Hermann Ney.Computing consensus translation from multiplemachine translation systems using enhancedhypotheses alignment[C] //Proceedings of EACL2006:33-40.
  • 4Tong Xiao,Jingbo Zhu,Muhua Zhu,et al.AdaBoost-based System Combination for Machine Translation[C] //Proceedings of ACL 2010:739-748.
  • 5John Henderson,Eric Brill.Exploiting diversity innatural language processing:combining parsers[C] //Proceedings of EMNLP 1999:187-194.
  • 6Kenji Sagae,Alon Lavie.Parser combination byreparsing[C] //Proceedings of HLT-NAACL 2006:129-132.
  • 7Yoav Freund,Robert Schapire.A decision theoreticgeneralization of on-line learning and an application toboosing[J].Journal of Computer and SystemSciences,1997,55(1):119-139.
  • 8John Henderson,Eric Brill.Bagging and Boosting aTreebank Parser[C] //Proceedings of ANLP 2000:34-41.
  • 9Michael Collins.Three generative,lexicalised modelsfor statistical parsing[C] //Proceedings of ACL 1997:16-23.
  • 10Kenji Sagae,Alon Lavie.A Classifier-based Parserwith Linear Run-Time Complexity[C] //Proceedingsof IWPT 2005.

共引文献8

同被引文献5

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部