摘要
针对大多数临床实验筛选标准的分类研究都集中在英文资格标准上,研究适合中文资格标准的分类模型,利用第五届中国健康信息处理会议开发的中文临床实验短文本数据集,结合神经网络和预训练语言模型对分类任务进行构建和微调,比较分析Word2vec-BiLSTM模型、CNN模型、RNN模型、预训练语言模型在此应用上的效果差异,并通过实验得到预训练模型ERNIE的分类效果优于其他模型。针对数据不平衡这一特征,对数量较少的类别语料进行数据增强后可有效提升模型的性能和效果,结果显示ERNIE模型的宏观平均F1值和微观平均F1值分别可达到0.8281和0.8537。
Classification research for most clinical trial screening criteria focuses on English eligibility criteria.This paper compares the characteristic of classification models suitable for Chinese eligibility criteria,using the Chinese clinical trial short text dataset developed by the 5th China Health Information Processing Conference,combined with neural networks and pre-trained language models to construct classification tasks and fine-tuning,analyzed the differences in the effects of the Word2vec-BiLSTM model,CNN model,RNN model,and pre-trained language model in this application,and obtained through experiments that the classification effect of the pre-trained model ERNIE performsbetter.In view of the characteristic of data imbalance,the performance and effect of the model can be effectively improved after data enhancement of a small number of category corpora.The results show that the macro-average F1 value and micro-average F1 value of the ERNIE model can reach 0.8281 and 0.8537,respectively.
作者
刘子琦
胡建成
牟谷芳
LIU Ziqi;HU Jiancheng;MOU Gufang(College Applied Mathematics,Chengdu University of Information Technology,Chengdu 610225,China)
出处
《成都信息工程大学学报》
2024年第2期170-177,共8页
Journal of Chengdu University of Information Technology
关键词
临床实验
医学短文本分类
深度学习
预训练模型
clinical trials
medical short text classification
deep learning
pre-training model