摘要
针对目前在线教育数据集规模相对较小,方面词抽取任务的标注数据相对稀缺的现象,提出利用在线课程评论制定相应的数据集。为验证基于深度学习的方面词抽取方法的有效性,提出基于MacBERT与对抗训练的方面词抽取模型。该模型利用MacBERT层提取文本语义信息并转换为词向量;在原始词向量上加入一定扰动生成对抗样本,提高模型鲁棒性;再通过BiLSTM层进一步获取上下文信息;最后使用CRF进一步优化模型结果,得到最佳预测序列。实验结果表明,在构建的在线课程评论数据集和人民日报公共数据集中,该模型识别结果优于其他基于主流神经网络的方面词抽取模型,较BERT-BiLSTM-CRF模型F1值分别提升了7.45%、7.06%,证明了该方法的可行性。
The current online education datasets are relatively small,and annotated data for aspect term extraction tasks is relatively scarce.To address this issue,a proposal is made to use online course reviews to create corresponding datasets.In order to validate the effectiveness of aspect term extraction methods based on deep learning,a model for aspect term extraction based on MacBERT and adversarial training is proposed.This model utilizes the MacBERT layer to extract semantic information from the text and convert it into word vectors.It adds a certain amount of perturbation to the original word vectors to generate adversarial samples,thereby improving the model’s robustness.Subsequently,the model further obtains contextual information through the BiLSTM layer,and finally uses CRF to optimize the model results and obtain the best predicted sequence.Experimental results demonstrate that in the constructed online course review dataset and the People’s Daily public dataset,the model’s identification results outperform other mainstream neural network-based aspect term extraction models,with improvements of 7.45%and 7.06%over the BERT-BiLSTM-CRF model,respectively,proving the feasibility of this approach.
作者
朱梦涵
唐海
李贵荣
徐洪胜
刘洋
ZHU Menghan;TANG Hai;LI Guirong;XU Hongsheng;LIU Yang(College of Electrical and Information Engineering,Hubei University of Automotive Technology,Shiyan Hubei,442002)
出处
《山西大同大学学报(自然科学版)》
2024年第5期21-26,55,共7页
Journal of Shanxi Datong University(Natural Science Edition)
基金
湖北省教育科学规划2022年度重点课题[2022GA049]、[2020GA045]
2023年湖北省科技计划项目[2023EHA018]。