摘要
目的基于深度学习算法BERT进行特征表示和文本分类,实现对随机对照试验(RCT)文献的自动化偏倚风险评价。方法计算机检索Cochrane图书馆,收集RCT相关信息并获取偏倚风险评价数据,据此构建文本分类所需数据集。采用BERT进行特征提取,构建文本分类模型,完成7类偏倚风险值(高、低)的评价。将原始数据集的80%作为训练集,10%作为测试集,10%作为验证集。采用准确率(P值)、召回率(R值)和F1值评价模型的性能,并将所得结果与传统机器学习方法(结合n-gram与TF-IDF的特征工程方法和LinearSVM分类器)结果进行比较。结果该模型在7类偏倚风险值评价任务上取得78.5%~95.2%的F1值,较传统机器学习方法高14.7%。在除"其它偏倚"外的其它6类偏倚描述句的提取任务上取得85.7%~92.8%的F1值,较机器学习方法高18.2%。结论基于BERT的自动化偏倚风险评价模型能够实现对RCT文献较高准确率的自动化偏倚风险评价,提高完成系统评价的效率和速度。
Objective To realize automatic risk bias assessment for the randomized controlled trial(RCT literature using) BERT(Bidirectional Encoder Representations from Transformers)as an approach for feature representation and text classification.Methods We first searched The Cochrane Library to obtain risk bias assessment data and detailed information on RCTs,and constructed data sets for text classification.We assigned 80%of the data set as the training set,10%as the test set,and 10%as the validation set.Then,we used BERT to extract features,construct text classification model,and evaluate the seven types of risk bias values(high and low).The results were compared with those from traditional machine learning methods using a combination of n-gram and TF-IDF as well as the Linear SVM classifier.The accuracy rate(P value),recall rate(R value)and F1 value were used to evaluate the performance of the models.Results Our BERT-based model achieved F1 values of 78.5%to 95.2%for the seven types of risk bias assessment tasks,which was 14.7%higher than the traditional machine learning method.F1 values of 85.7%to 92.8%were obtained in the extraction task of the other six types of biased descriptors except"other sources of bias",which was 18.2%higher than the traditional machine learning method.Conclusions The BERT-based automatic risk bias assessment model can realize higher accuracy in risk of bias assessment for RCT literature,and improve the efficiency of assessment.
作者
夏渊
刘东峰
张津馗
李科
XIA Yuan;LIU Dongfeng;ZHANG Jinkui;LI Ke(School of Life Science&Technology,University of Electronic Science&Technology of China,Chengdu 610051,P.R.China)
出处
《中国循证医学杂志》
CSCD
北大核心
2021年第2期204-209,共6页
Chinese Journal of Evidence-based Medicine
基金
四川省科技计划项目(编号:2019JDPT0008、18PTDJ0116)。
关键词
循证医学
系统评价
自动化
偏倚风险评价
BERT
Evidence-based medicine
Systematic review
Automated
Risk bias assessment
BERT