摘要
Genotyping of structural variations considering copy number variations(CNVs)is an infancy and challenging problem.CNVs,a prevalent form of critical genetic variations that cause abnormal copy numbers of large genomic regions in cells,often affect transcription and contribute to a variety of diseases.The characteristics of CNVs often lead to the ambiguity and confusion of existing genotyping features and algorithms,which may cause heterozygous variations to be erroneously genotyped as homozygous variations and seriously affect the accuracy of downstream analysis.As the allelic copy number increases,the error rate of genotyping increases sharply.Some instances with different copy numbers play an auxiliary role in the genotyping classification problem,but some will seriously interfere with the accuracy of the model.Motivated by these,we propose a transfer learning-based method to genotype structural variations accurately considering CNVs.The method first divides the instances with different allelic copy numbers and trains the basic machine learning framework with different genotype datasets.It maximizes the weights of the instances that contribute to classification and minimizes the weights of the instances that hinder correct genotyping.By adjusting the weights of the instances with different allelic copy numbers,the contribution of all the instances to genotyping can be maximized,and the genotyping errors of heterozygote variations caused by CNVs can be minimized.We applied the proposed method to both the simulated and real datasets,and compared it to some popular algorithms including GATK,Facets and Gindel.The experimental results demonstrate that the proposed method outperforms the others in terms of accuracy,stability and efficiency.The source codes have been uploaded at github/TrinaZ/CNVtransfer for academic use only.
基金
supported by the National Natural Science Foundation of China (Grant No.31701150)
the Fundamental Research Funds for the Central Universities (CXTD2017003).