摘要
针对传统多标签分类模型未充分考虑文本中临近标签之间存在的复杂关联性问题,提出一种基于局部注意力Seq2Seq的中医文本多标签分类模型。首先利用ALBERT模型提取文本的动态语义向量;然后多层Bi-LSTM构成的编码层用于提取文本间的语义关系;最后解码层中使用多层LSTM的局部注意力,突出文本序列中临近标签之间的相互影响力,以预测多标签序列。在中医数据集上验证方法的有效性,实验结果表明,所提出的算法能够有效捕获标签之间的相关性,适用于中医文本的分类预测。
Aiming at the problem that traditional multi-label classification model does not fully consider the complex correlation between adjacent labels in the texts,a multi-label classification model of TCM text based on local attention Seq2Seq is proposed.Firstly,the ALBERT model is used to extract the dynamic semantic vector of the texts.Secondly,the coding layer composed of multiple layers of Bi-LSTM is used to extract the semantic relationship between texts.Finally,the local attention of multiple layers of LSTMs is used in the decoding layer to highlight the mutual influence between adjacent labels in the text sequence to predict multi-label sequences.The effectiveness of the method is validated on TCM datasets.The experimental results show that the proposed algorithm can effectively capture the correlation between labels,which is suitable for the classification prediction of TCM texts.
作者
刘勇
杜建强
罗计根
李清
于梦波
郑奇民
LIU Yong;DU Jianqiang;LUO Jigen;LI Qing;YU Mengbo;ZHENG Qimin(College of Computer Science,Jiangxi University of Chinese Medicine,Nanchang 330004,China;Qihuang Chinese Medicine Academy,Jiangxi University of Chinese Medicine,Nanchang 330025,China)
出处
《现代信息科技》
2023年第17期96-101,共6页
Modern Information Technology
基金
国家自然科学基金(82260988)
江西省自然科学基金(20202BAB202019)
江西省研究生创新专项资金项目(YC2021-S499)
江西中医药大学校级科技创新团队发展计划(CXTD22015)。