基于BERT和CNN的致病剪接突变预测方法

BERT and CNN-Based Deleterious Splicing Mutation Prediction Method

下载PDF

导出

摘要遗传诊断中的一个关键挑战是评估与剪接相关的致病遗传突变.现有致病剪接突变预测工具大多基于传统的机器学习方法,主要依赖人工提取的剪接特征,从而限制预测性能的提升,尤其对于非经典剪接突变,性能较差.因此,文中提出基于BERT(Bidirectional Encoder Representations from Transformers)和CNN(Convolutional Neural Network)的致病剪接突变预测方法(BERT and CNN-Based Deleterious Splicing Mutation Prediction Method,BCsplice).BCsplice中BERT模块可全面提取序列的上下文信息,与提取局部特征的CNN结合后,可充分学习序列的语义信息,预测剪接突变致病性.非经典剪接突变的影响往往更依赖序列上下文的深层语义信息,通过CNN将BERT的多级别语义信息进行组合提取,可获得丰富的信息表示,有助于识别非经典剪接突变.对比实验表明BCsplice性能较优,尤其是在非经典剪接区表现出一定性能优势,有助于识别致病剪接突变和临床遗传诊断. A key challenge in genetic diagnosis is the assessment of pathogenic genetic mutations related to splicing.Existing predictive tools for pathogenic splicing mutations are mostly based on traditional machine learning methods,heavily relying on manually extracted splicing features.Thereby the predictive performance is limited,especially for non-canonical splicing mutation producing poor performance.Therefore,a bidirectional encoder representations from transformers(BERT)and convolutional neural network(CNN)-based deleterious splicing mutation prediction method(BCsplice)is proposed.The BERT module in BCsplice comprehensively extracts contextual information of sequences.While combined with CNN that extracts local features,BERT module can adequately learn the semantic information of sequences and predict the pathogenicity of splicing mutations.The impact of non-canonical splicing mutations often relies more on deep semantic information of sequence context.By combining and extracting the multi-level semantic information of BERT through CNN,rich information representations can be obtained,aiding in the identification of non-canonical splicing mutations.Comparative experiments demonstrate the superior performance of BCsplice,especially exhibiting certain performance advantages in non-canonical splicing regions,and it contributes to the identification of pathogenic splicing mutations and clinical genetic diagnosis.

作者宋程程赵依然李晓艳夏俊峰 SONG Chengcheng;ZHAO Yiran;LI Xiaoyan;XIA Junfeng(Institutes of Physical Science and Information Technology,Anhui University,Hefei 230601)

机构地区安徽大学物质科学与信息技术研究院

出处《模式识别与人工智能》 EI CSCD 北大核心 2024年第2期181-190,共10页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金项目(No.U22A2038)资助。

关键词致病剪接突变深度学习预测模型致病性预测 Deleterious Splicing Mutation Deep Learning Prediction Model Pathogenicity Prediction

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1黄喆兰(综述),周文浩(审校).儿童脑白质营养不良的临床遗传学研究进展[J].中国当代儿科杂志,2022,24(6):711-716.
2戴选彤.Deciphering the pathogenicity of COL4A4 heterozygous splicing mutations and the genotypephenotype correlation in autosomal dominant Alport syndrome[J].China Medical Abstracts(Internal Medicine),2023,40(2):114-115.
3陈宾,董欣然,王慧君,吴冰冰,杨琳,王潇,王雅琼,倪琦,李川,周文浩,卢宇蓝.复旦大学附属儿科医院高通量测序数据一体化全流程闭环分析系统及临床应用案例分析[J].中国循证儿科杂志,2022,17(3):202-209. 被引量：2
4Hui-Qin Wang,Pei-Kuan Cong,Tian He,Xiao-Feng Yu,Ya-Nan Huo.A novel pathogenic splicing mutation of RPGR in a Chinese family with X-linked retinitis pigmentosa verified by minigene splicing assay[J].International Journal of Ophthalmology(English edition),2023,16(10):1595-1600.
5秦波,程云章,耿晓斌,张天逸.人工血泵用空心杯电机设计[J].轻工机械,2024,42(1):92-97.
6张博文,杨胜男,田甜,何雪,徐学刚.TGF-β/BMP/Smad信号通路及其参与毛囊发育及周期调控的研究进展[J].解剖科学进展,2023,29(6):667-670.
7周睿,关静,王秋菊.儿童轻中度感音神经性听力损失的遗传特征分析[J].临床耳鼻咽喉头颈外科杂志,2024,38(1):18-22.
8李昊昱,徐一波.基于MogrifierLSTM的POI推荐算法研究[J].计算机科学与应用,2024,14(2):384-395.
9黄新,侯亚坤,陶宁,卓涛,热娜古丽·艾海提,张凯歌,姚淼,安恒庆.高瘤负荷骨转移前列腺癌患者列线图预后模型的构建与评估[J].现代泌尿外科杂志,2024,29(3):205-211.
10Xinyu Liu,Yulong Meng,Fangwei Liu,Lingyu Chen,Xinfeng Zhang,Junyu Lin,Husheng Gou.Research on Multi-Modal Time Series Data Prediction Method Based on Dual-Stage Attention Mechanism[J].国际计算机前沿大会会议论文集,2023(1):127-144.

模式识别与人工智能

2024年第2期

浏览历史

内容加载中请稍等...

基于BERT和CNN的致病剪接突变预测方法

相关作者

相关机构

相关主题

浏览历史