摘要
确定出生缺陷高危致病基因类型,推进遗传性疾病早期筛查和生育指导,对于先天性听力损失等出生缺陷的一级预防具有重要意义。本文采用通用数据挖掘工具,应用其决策树算法分析了近千例GJB2基因突变检测的临床数据,建立了听力出生缺陷的致病基因辅助筛查模型。通过研究模型树的结构和样本分类结果,发现模型树中有5组分支获得了纯净的听力损失阳性样本。此外,每个分支构成的基因位点的状态集合与临床研究证实的致病基因突变状态相一致。该决策树方法建立的筛查模型可以协助医生从临床大数据中快速筛选出致病基因的类型。
It is important to determine the genetic types of high risk of birth defects and to promote early heredopathia screening and birth guidance for first class prevention of birth defect like congenital hearing loss. This research applies the decision tree algorithm to analyze the clinical data of nearly 1 000 cases of GJB2 genetic mutation record by using data mining tools. The model of assistant screening for pathogenic genes of hearing loss birth defects is established. By studying the structure of the model tree and samples’ classification results, it is concluded that pure positive samples of hearing loss are obtained in five groups of branches in the model tree. Besides, the set of genetic loci states formed by each of these branches is consistent with the mutation state of pathogenic genes confirmed by clinical studies. The screening model established by the decision tree method can assist doctors to quickly screen out the types of pathogenic genes from the clinical big data.
作者
郭宇
李凤美
陈雨行
洪凯程
赵也明
陈晓禾
GUO Yu;LI Fengmei;CHEN Yuhang;HONG Kaicheng;ZHAO Yeming;CHEN Xiaohe(Suzhou Institute of Biomedical Engineering and Technology,Chinese Academy of Sciences,Suzhou Jiangsu 215163,China)
出处
《太赫兹科学与电子信息学报》
北大核心
2020年第4期703-707,共5页
Journal of Terahertz Science and Electronic Information Technology
基金
科技部重点研发计划重点专项资助项目(2017YFC1001800)。
关键词
临床大数据
听力损失
决策树
基因筛查
数据挖掘
人工智能
clinical big data
hearing loss
decision tree
genetic screening
data mining
artificial intelligence