期刊文献+

句法分析中基于词汇化树邻接语法的数据增强方法 被引量:1

Lexicalized Tree Adjoining Grammar Based Data Augmentation for Parsing
下载PDF
导出
摘要 句法分析是自然语言处理的基础技术,主流的由数据驱动的神经网络句法分析模型需要大规模的标注数据,但是通过人工标注扩展树库成本很高,因此如何利用现有标注树库进行数据增强成为研究焦点。在汉语句法分析的数据增强任务中,对于给定的标注树库,要求数据增强所生成的句子满足如下条件:第一,要求生成句具有多样化且完整的句法树结构;第二,要求生成句具有合理的语义。对此,我们首次提出基于词汇化树邻接语法的数据增强方法。针对第一个需求,该文设计实现基于词汇化树邻接语法的词汇化树抽取算法与句法树合成算法,基于该语法可以在句法树之间进行“接插”和“替换”的操作,从而推导生成新的句法树,并且用语言学的知识保证生成句符合语法规则且具有完整的句法树结构。针对第二个需求,该文利用语言模型对生成句进行语义合理性评估,选取语义合理的句子作为最终的增强数据,从而获取高质量的标注树库。我们以汉语为例开展研究,在汉语树库CTB5上进行句法分析的数据增强评测实验。实验结果显示,在小样本(CTB5的20%)实验中,通过该方法得到的增强数据使依存句法分析和成分句法分析的精度分别提高1.39%和2.14%。在鲁棒性实验中,该文通过构建扩展测试集进行评测实验,在扩展测试集上,通过该方法得到的增强数据使依存句法分析和成分句法分析的精度分别提高1.43%和0.44%,表现出更好的鲁棒性。 Parsing is a key technology in natural language processing.The neural network based parsing models require large-scale annotated data,and data augmentation technology is demanded to extend the exiting treebank.This paper proposes a data augmentation approach based on a lexicalized tree adjoining grammar for parsing.To generate sentences with various expressions of correct syntax structure,we design and implement a lexicalized tree extraction algorithm and a parse tree synthesis algorithm,in which"adjoining"and"substitution"operations are utilized to derive new syntactic trees.To generate the semantically correct sentences,we use language model to evaluate the derived sentences.Experiments on Chinese treebank CTB5 shows that dependency and constituency parsing accuracy could be improved by 1.39%and 2.14%on the 20%of CTB5 data show that the accuracy of strained on the derived data are increased,respectively.
作者 陈鸿彬 张玉洁 徐金安 陈钰枫 CHEN Hongbin;ZHANG Yujie;XU Jin'an;CHEN Yufeng(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
出处 《中文信息学报》 CSCD 北大核心 2022年第10期27-37,44,共12页 Journal of Chinese Information Processing
基金 国家自然科学基金(61876198,61976015,61976016)。
关键词 依存句法分析 成分句法分析 词汇化树邻接语法 语言模型 数据增强 dependency parsing constituency parsing lexicalized tree adjoining grammar language model data augmentation
  • 相关文献

参考文献2

二级参考文献17

  • 1CHEN J, VIJAY-SHANKER K, Automated extraction of tags from the Penn Treebank[ A]. Proceedings of the 6th International Workshop on Parsing[ C], Italy: Trento, 2000.
  • 2XIA F, Extracting Tree Adjoining Grammars from Bracketed Corpora[ A]. Fifth Natural Language Processing Pacific Rim Symposium(NLPRS-99) [ C], Beijing: Tsinghua University Press, 1999,.
  • 3XIA F, HAN CH, PALMER M, et al, Comparing lexicalized treebank grammars extracted from Chinese, Korean, and English corpora[ A], Proceedings of the Second Chinese Language Processing Workshop (CLP-2000) [ C]. Hong Kong: University of Hong Kong,2000,.
  • 4XIA F, PALMER M, VIJAY-SHANKER K. Consistent Grammar Development Using Partial-Tree Descriptions for Lexicalized Tree-Adjoining Grammars[ A], Proc, of 4th Workshop on Tree-Adjoining Grammars and Related Formalisms[ C]. Philadelphia: University of Pennsylvania, 1998,.
  • 5XUE N, XIA F, The Bracketing Guidelines for the Penn Chinese Treebank[ R], Philadelphia: Technical Report IRCS 00-8, University of Pennsylvania, 2000.
  • 6The XTAG Research Group, A Lexicalized Tree Adjoining Grammar For English [ EB/OL]. http://www. cis. upenn. edu/- xtag/,1998.
  • 7CHEN J, VIJAY-SHANKER K, Automated extraction of tags from the Penn Treebank[A]. Proceedings of the 6th International Workshop on Parsing[ C], Italy: Trento, 2000.
  • 8XIA F, Extracting Tree Adjoining Grammars from Bracketed Corpora[A]. Fifth Natural Language Processing Pacific Rim Symposium(NLPRS-99) [C], Beijing: Tsinghua University Press, 1999,.
  • 9XIA F, HAN CH, PALMER M, et aL Comparing lexicalized treebank grammars extracted from Chinese, Korean, and English corpora[ A], Proceedings of the Second Chinese Language Processing Workshop (CLP-2000) [C]. Hong Kong: University of Hong Kong,2000,.
  • 10XIA F, PALMER M, VIJAY-SHANKER K. Consistent Grammar Development Using Partial-Tree Descriptions for Lexicalized Tree-Adjoining Grammars[A], Proc, of 4th Workshop on Tree-Adjoining Grammars and Related Formalisms[ C]. Philadelphia: University of Pennsylvania, 1998,.

共引文献2

同被引文献14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部