摘要
针对多项选择问答(MCQA)领域中原始数据信息不准确、样本质量低以及模型泛化能力差等问题,提出一种基于图卷积网络(GCN)的掩码数据增强模型GMDA(Graph convolution network-based MASK Data Augmentation)。该模型以GCN作为基础框架,首先将文章中的单词抽象为图节点,并利用问题-候选答案(QA)对节点进行连接,建立与相关的文章节点之间的联系;其次,计算节点之间的相似性,并应用掩码技术对图中的节点进行掩盖,从而生成增强样本;再次,利用GCN对增强样本进行特征扩充,以提升模型的信息表达能力;最后,引入打分器对原始样本和增强样本进行评分,并结合课程学习策略提高答案预测的准确性。综合评估实验结果表明:与RACE-M、RACE-H数据集上的最优基线模型EAM相比,所提模型GMDA的准确率分别平均提高了0.8、0.4个百分点,而与DREAM数据集上的最优基线模型STM(SelfTraining Method)相比,GMDA模型的准确率平均提高了1.4个百分点。此外,对比实验的结果也验证了GMDA模型在MCQA任务中的有效性,可为数据增强技术在该领域的进一步研究和应用提供帮助。
ed as graph nodes and connected by Question-candidate Answer(QA)pair nodes to establish connections with related article nodes.Secondly,the similarity between nodes was calculated and the masking technique was applied to mask the nodes in the graph to generate the augmented samples.Thirdly,the augmented samples were subjected to feature expansion by using GCN to enhance the models information representation capability.Finally,a scorer was introduced to score the original and augmented samples,and the curriculum learning strategy was combined to improve the accuracy of answer prediction.The results of the comprehensive evaluation experiments show that compared with the best baseline model EAM on RACE-M and RACE-H datasets,the proposed GMDA model improves the accuracy by an average of 0.8 and 0.4 percentage points respectively,and compared with the best baseline model STM(SelfTraining Method)on DREAM dataset,the GMDA model has the average accuracy improved by 1.4 percentage points.Besides,comparative experiments also prove the effectiveness of the GMDA model in MCQA tasks,which can help further research and application of data augmentation techniques in this field.
作者
胡新荣
陈静雪
黄子键
王帮超
姚迅
刘军平
朱强
杨捷
HU Xinrong;CHEN Jingxue;HUANG Zijian;WANG Bangchao;YAO Xun;LIU Junping;ZHU Qiang;YANG Jie(School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan Hubei 430200,China;School of Computer and Information Technology,University of Wollongong Australia,Wollongong New South Wales 2259,Australia)
出处
《计算机应用》
CSCD
北大核心
2024年第11期3335-3344,共10页
journal of Computer Applications
基金
CCF-智谱AI大模型创新基金项目(CCF-Zhipu202312)。
关键词
多项选择问答
数据增强
图卷积网络
打分器
课程学习
Multiple-Choice Question Answering(MCQA)
data augmentation
Graph Convolutional Network(GCN)
scorer
curriculum learning