
融合丰富语言知识的汉语统计句法分析 被引量:2

Chinese Statistical Parsing with Rich Linguistic Features
摘要 知识获取一直以来是自然语言处理中的瓶颈,基于树库的统计句法分析也不例外。树库中潜在隐含的语言知识是非常丰富的,但它们并不是可以直接得到,往往需要特定的策略才能将它们融合到模型中。我们的汉语统计句法分析模型从3个方面融合潜在的丰富语言知识:1)重新标注树库中的非递归名词短语和非递归动词短语;2 )设计新的中心词映射表;3)引进上下文配置框架以更具体地描述二元依存结构。由于融合了以上三种潜在语言知识,模型的F1值提高了2 37% ,完全匹配正确率提高了5 36 %。 Knowledge acquisition is always regarded as a bottleneck in many NLP tasks, such as machine translation, information extraction. Treebank-based statistical parsing is not an exceptant. The latent linguistic knowledge in treebank is very rich, which, however, cant be acquired directly.In our model, the following three ways are used to incorporate such rich linguistic features for Chinese statistical parsing. First of all, non-recursive noun and verb phrases are annotated in the Penn Chinese Treebank because of their strong mark of boundaries. Second, a new head percolation table is designed based on Xias table. The last linguistic feature our model uses is the context configuration frame which provides a stronger representation of bilexical dependency structures. All these three linguistic features gain an improvement of remarkable 2.37% in terms of F1 measure, 5.36% in terms of complete match ratio.
出处 《中文信息学报》 CSCD 北大核心 2005年第3期61-66,共6页 Journal of Chinese Information Processing
基金 国家 8 6 3计划资助项目 (2 0 0 3AA1110 10 2 0 0 1AA114 0 10 )
关键词 人工智能 自然语言处理 统计句法分析 非递归短语 中心词映射表 上下文配置框架 artificial intelligence natural language processing statistical parsing non-recursive NPs head percolation table context configuration frame
  • 相关文献


  • 1Michael Collins. Head-Driven Statistical Models for Natural Language Parsing [ D]. PhD thesis, University of Pennsylvania, 1999.
  • 2Chamiak Eugene. 1996. Tree-bank Grammars [A]. AAAI/IAM [C], Vol. 2.
  • 3Dan Klein, Christorpher D.Manning. 2003. Accurate Unlexicalized Parsing [ A]. In: Proceedings of the 42th Association for Computational Linguistics [ C].
  • 4Daniel M. Bikel and David Chiang. 2000. Two statistical parsing models applied to the chinese Treebank [ A]. In:Proceedings of the Second Chinese I~mguage Processing Workshop [ C], 1 - 6.
  • 5Roger Levy, Christopher Manning. 2003. Is it harder to parse Chinese, or the Chinese Treebank? [ A]. In: Proceedings of the 42th Association for Computational Linguistics [ C].
  • 6Deyi Xiong, Qun Liu and Shouxun Lin. 2005. Lexiealized Beam ThresholdingParsing with Prior and Boundary Estimates [ A ]. In: Proceedings of CICLing 2005 [ C], Mexico.
  • 7David Chiang and Daniel M. Bikel. 2002. Recovering Latent Information in Treebanks [ A]. In:Proceedings of COLING 2002 [C].
  • 8Fei Xia. Automatic Grammar Generation from Two Different Perspectives [D]. PhD thesis, University of Pennsylvania, 1999.
  • 9Nianwen Xue and Fei Xia. 2000. The Bracketing Guidelines for Chinese Treebank Project [R]. Technical Report IRCS 00 - 08, University of Pennsylvania.


  • 1俞士汶,段慧明,朱学锋,张化瑞.综合型语言知识库的建设与利用[J].中文信息学报,2004,18(5):1-10. 被引量:29
  • 2曹勇刚,曹羽中,金茂忠,刘超.面向信息检索的自适应中文分词系统[J].软件学报,2006,17(3):356-363. 被引量:48
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:153
  • 4中国科学院中文分词工具(ICTCLAS)[EB/OL].http://ictclas.org/.
  • 5Croft B, Metzler D, Strohman T. Search engines: information retrieval in practice [M]. Addison-Wesley Publishing Company, USA, 2009.
  • 6Shen X, Tan B, Zhai C. Implicit user modeling for personalized seareh[C]//Proeeeding of the 14th ACM International Conference on Information and Knowledge Management. 2005:824-831.
  • 7Teevan J, Durnais S, Horvitz E. Personalizing search via automated analysis of interests and activities[C]// Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2005: 449-456.
  • 8Dou Z, Song R, Wen J. A large-scale evaluation and analysis of personalized search strategies [C]//Proceeding of the 16th International Conference on World Wide Web. 2007: 581-590.
  • 9Teevan J, Dumais S, Liebling D. To personalize or not to personalize: modeling queries with variation in user intent[C]//Proeeeding of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008: 163-170.
  • 10Teevan J, Durnais S, Horvitz E. Potential for personalization[J]. ACM Transactions on Computer-Human Interaction (TOCHI), 2010(17) : 1-31.










使用帮助 返回顶部