摘要
在智能客服问答系统中,用户所提的问句存在着特征稀疏性强、口语化严重以及错别字等特点,导致问句相似度计算的准确率不高,出现答非所问的情况。提出一种基于双向长短时记忆神经网络的问句相似度计算模型SA-BiLSTM。通过对问句进行字向量的表示方法,采用Bi-LSTM提取句子语序关系特征并结合Self-Attention机制动态的调整特征权重,提高模型对问题的理解能力。在微众银行智能客服问句匹配大赛数据集(CCKS2018 Task3)上的实验结果表明,对问句采用字向量表示比词向量表示效果更好,使用自注意力机制可以使模型能学习更多问句中关键特征,SA-BiLSTM模型对问句的识别能力更强,其F1值提高了1.42%。
In the intelligent customer service question answering system, the questions asked by users have the characteristics of strong feature sparseness, serious colloquialization, and typos, which results in the low accuracy of the calculation of the similarity of the question, and an answer beongd the questions. In the paper, we proposed a question similarity computation model based on Bidirectional Long-Short Term Memory SA-BiLSTM. Through the word vector representation method of the question sentence, the Bi-LSTM was used to extract the sentence word order relationship features and the self-attention mechanism was used to dynamically adjust the feature weights, so as to improve the understand ability of the model to the problem. The experimental results on CCKS2018 Task3 show that using character vector representation for question sentences is better than word vector representation. Using self-attention mechanism can enable the model to learn more key feature, the SA-BiLSTM model has stronger ability to recognize the question sentence, and its F1 measure increases by 1.42%.
作者
黄晓洲
段隆振
周玲元
HUANG Xiao-zhou;DUAN Long-zhen;ZHOU Ling-yuan(College of Information Engineering,Nanchang University,Nanchang,330029,China;College of Economics and Management,Nanchang HangKongUniversity,Nanchang,330063,China)
出处
《计算机仿真》
北大核心
2022年第10期486-491,共6页
Computer Simulation
基金
国家自然科学基金资助项目(71761028)。