摘要
对于目前基于机器学习开发的危害性预测软件在罕见错义突变上的预测效果评估。收集独立的测试数据集,数据集1和数据集2的致病突变与中性突变分别来源于ClinVar和UniProt数据库,两数据集间相互独立。同时下载了已报道明确致病位点的真实样本测序数据,评估不同软件在真实测序数据上致病突变的发现效果。结果表明不同软件对罕见错义突变危害性的预测效果不同,两份测试数据集的评估结果显示REVEL对错义突变的预测效果优于其他软件,但真实测序数据的评估提示在选择危害性预测软件时需综合考量权衡假阳性率与假阴性率。
To evaluate the performance of machine-learning based pathogenicity prediction tools on rare missense variants.We derived two independent test sets,and the pathogenic SNVs and putatively neutral SNVs of test set I and test set II were collected from the ClinVar and UniProt database,respectively.We also downloaded the real genomic sequencing data of patients with the known causative variants to evaluate the clinical practice for different predictors.The capacity of the different tools to predict pathogenicity of rare missense variants is heterogenous.REVEL showed highest performance score among these tools in the two independent test sets.The evaluation result of real genomic sequencing data indicated that we need to balance the false negative rate and positive rate when using pathogenicity prediction tools.
作者
党孝
孙宇辉
蒋廷亚
周阳
连超群
DANG Xiao;SUN Yuhui;JIANG Tingya;ZHOU Yang;LIAN Chaoqun(The Roberts Center for Pediatric,philadelphia,PA 19146,USA;Allodx Biotech.Co.,Ltd.suzhou 215000,china;Institute of Life Sciences,Jiangsu University,zhenjiang 212013,china;Department of Biochemistry and Molecular Biology,Bengbu Medical University,zhenjinang 212013,china)
出处
《皖西学院学报》
2018年第5期97-101,114,共6页
Journal of West Anhui University
基金
蚌埠医学院发展基金(BYKF1727)
国家级大学生创新项目(201710367036
201810367021)
国家自然科学基金(31301919)
关键词
机器学习
危害性预测软件
罕见错义突变
致病突变
中性突变
machine learning
pathogenicity prediction tools
rare missense variants
pathogenic variants
neutral variants