期刊文献+

基于迁移学习和集成学习的医疗文本分类 被引量:1

Medical Text Classification Based on Transfer Learning and Ensemble Learning
下载PDF
导出
摘要 针对医疗文本语义稀疏、维度过高的问题,提出一种基于迁移学习和集成学习的多标签医疗文本分类算法(Trans-LSTM-CNN-Multi,TLCM)。该算法采用ALBERT(A Lite BERT)模型内部的多层双向Transfomer结构对大型语料库展开训练,获取通用领域的文本动态字向量表示。然后,利用医学领域目标数据集通过迁移学习和模型微调技术实现ALBERT预训练语言模型在医学领域的文本语义增强。在此基础上,将上述通过迁移学习得到的文本语义增强模型输入到Bi-LSTM-CNN集成学习模块,进一步提取医学文本内容的重要信息特征。最后,基于二元交叉熵损失函数构造文本多标签分类器实现医疗文本分类。实验结果表明,通过迁移学习和集成学习的TLCM文本分类算法能有效提升医疗文本的分类性能,在中文健康问句数据集上整体F1值达到了91.8%。 Aiming at the problems of sparse semantic and high dimension of medical text,a multi-label medical text classification algorithm based on transfer learning and ensemble learning named TLCM(Trans-LSTM-CNN-Multi) is proposed.Firstly,the large-scale corpus is trained through the multi-layer Transfomer structure inside the ALBERT(A Lite BERT) model to obtain the dynamic word vector representation of the text.Then,the target data set in the medical field is used to realize the text semantic enhancement in the medical field through transfer learning and model fine-tuning technology based on ALBERT(A Lite BERT) pre-training language model.On this basis,the above-mentioned semantic enhancement model obtained through transfer learning is input to the Bi-LSTM-CNN ensemble learning module to further extract important information characteristics of medical text content.Finally,a text multi-label classifier based on binary cross entropy loss function is constructed to achieve medical text classification.The experimental results show that the text classification algorithm through transfer learning and ensemble learning can effectively improve the overall performance of the model,and finally the overall F1 value on the Chinese health question data set reaches 91.8%.
作者 郑承宇 王新 王婷 徐权峰 ZHENG Cheng-yu;WANG Xin;WANG Ting;XU Quan-feng(School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)
出处 《计算机技术与发展》 2022年第4期28-33,共6页 Computer Technology and Development
基金 国家自然科学基金资助项目(61363022) 云南省教育厅科学研究基金项目(2021Y670)。
关键词 迁移学习 集成学习 ALBERT Bi-LSTM-CNN 医疗文本 健康问句 transfer learning ensemble learning ALBERT Bi-LSTM-CNN medical text health question
  • 相关文献

参考文献6

二级参考文献52

共引文献39

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部