摘要
目前旅游产业信息化建设需要构建旅游自动问答系统,其中问句分类是问答系统的重要组成部分,传统问句类别体系角度单一,且传统分类模型对不平衡的问句数据集表现欠佳。针对这一问题,该文从问题主题和问句答案类型两个角度构建了旅游领域的问句类别体系架构,并提出多任务问句分类模型MT-Bert,在BERT上进行多任务训练,并加入自注意力机制,使用Softmax分类器,并设计了多任务融合损失函数。在山西旅游数据集的结果表明,MT-Bert在两种类别体系的微平均F1值分别为97.6%、91.7%,且避免了非平衡数据的预测失败问题,可以有效处理非平衡数据。
At present,the tourism industry information construction needs to construct the tourism automatic question and answer system,in which the questions classification is a significant part of the question and answer system,the traditional question category system angle is single,and the traditional classification model is not good for the unbalanced question data set.To solve the above situation,this paper constructs the architecture of question category in tourism field from two angles:question theme and question answer type.And it proposed multi-task question classification model MT-Bert,conducted multi-task training on Bert,added self-attention mechanism,used Softmax classifier,and designed multi-task fusion loss function.The results on tourism Data Set in Shanxi show that the micro average F1 values of MT-Bert in the two kinds of systems are 97.6%and 91.7%respectively,and the prediction failure of unbalanced data is avoided,so the unbalanced data can be processed effectively.
作者
陈千
冯子珍
王素格
郭鑫
Chen Qian;Feng Zizhen;Wang Suge;Guo Xin(Faculty of Computer and Information Technology,Shanxi University,Taiyuan 030006,Shanxi,China;Computer Intelligence and Chinese Information Processing Ministry of Education Key Laboratory,Shanxi University,Taiyuan 030006,Shanxi,China)
出处
《计算机应用与软件》
北大核心
2024年第1期336-342,共7页
Computer Applications and Software
基金
山西省重点研发计划项目(201803D421024)
山西省应用基础研究计划项目(201901D111032,201701D221101)
国家自然科学基金项目(61502288,61403238)。
关键词
旅游问答
问句分类
分类体系
BERT
自注意力
多任务
Tourism question and answer(QA)
Question classification
Classification system
BRET
Self-attention
Multi-task