摘要
绝大部分非编码区的基因功能尚不清楚,而许多的遗传变体就存在这些区域,如何识别与疾病相关的变体仍是一个挑战。已有基于支持向量机的算法CADD被提出,它可以注释编码和非编码区的变体,但是该方法未能捕获特征间的非线性关系。为了解决此问题,设计了一个混合卷积网络和全连接网路的模型,能很好地捕获特征之间的非线性关系。在测试集上,方法达到了最高的66.44%准确率。
The genetic function of most non-coding regions is unclear, and many genetic variants have been found in these regions. How to identify associated disease variants is still a challenge. A Support Vector Machine based algorithm CADD has been proposed, which can annotate coding and non-coding region variants. However, CADD fails to capture non-linear relationship among features. To solve this problem, this paper designed a hybrid convolutional neural network and fully connected neural network model. This model can capture non-linear relationship well among features. Our method achieves the highest accuracy of 66.44% on the testing set.
作者
杨书新
汤达荣
YANG Shu-xin;TANG Da-rong(College of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
出处
《生物学杂志》
CAS
CSCD
北大核心
2019年第4期94-96,101,共4页
Journal of Biology
基金
国家自然科学基金地区项目(41362015)
江西省教育厅科技项目(GJJ170518)
关键词
深度学习
遗传变体
致病性
注释
deep learning
genetic variants
pathogenicity
annotation