期刊文献+

融合多策略数据增强的低资源依存句法分析方法 被引量:6

Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation
下载PDF
导出
摘要 依存句法分析旨在识别句子中词与词之间的句法依赖关系。依存句法能为信息抽取、自动问答和机器翻译等任务提供句法特征,提高模型性能。训练数据规模对依存句法分析模型的性能具有重要影响,训练数据的缺乏会带来严重的未知词问题和模型过拟合问题。文中针对低资源依存句法分析问题,提出了多种数据增强策略。所提方法通过同义词替换有效扩充了训练数据,缓解了未知词问题。通过多种Mixup的数据增强策略,有效缓解了模型过拟合问题,提高了模型的泛化能力。在(Universal Dependencies treebanks,UD treebanks)数据集上的实验结果表明,所提方法有效提升了小规模训练语料条件下泰语、越南语和英语依存句法分析的性能。 Dependency parsing aims to identify syntactic dependencies between words in a sentence.Dependency parsing can provide syntactic features and improve model performance for tasks such as information extraction,automatic question answering and machine translation.The training data size has an significant impact on the performance of the dependency parsing model.The lack of training data will cause serious unknown word problems and model over-fitting problems.This paper proposes various data augment strategies for the problem of low-resource dependency parsing.The proposed method effectively expands the training data by synonym substitution and alleviates the unknown words problem.The data augment strategies of multiple Mixups effectively alleviate the model overfitting problem and improve the generalization ability of the model.Experimental results on the universal dependencies treebanks(UD treebanks)dataset show that the proposed methods effectively improve the performance of Thai,Vietnamese and English dependency parsing under small-scale training corpus conditions.
作者 线岩团 高凡雅 相艳 余正涛 王剑 XIAN Yan-tuan;GAO Fan-ya;XIANG Yan;YU Zheng-tao;WANG Jian(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
出处 《计算机科学》 CSCD 北大核心 2022年第1期73-79,共7页 Computer Science
基金 国家自然科学基金项目(61732005,61972186) 云南省重大科技专项(202002AD080001,202103AA080015) 云南省高新技术产业专项(201606)。
关键词 依存句法分析 低资源语言 Mixup数据增强 同义词替换 多策略 Dependency parsing Low-resource language Mixup data augmentation Synonym substitution Multi-strategy
  • 相关文献

参考文献1

共引文献11

同被引文献89

引证文献6

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部