Although with the continuous development of sequencing technology,the number of genome and protein sequences has grown rapidly,these sequences are only a small part of nature.Biologically,it is still a challenging and...Although with the continuous development of sequencing technology,the number of genome and protein sequences has grown rapidly,these sequences are only a small part of nature.Biologically,it is still a challenging and important problem to detect and predict some new genome or protein sequences based on real sequence data,which motivates us to solve the problem mathematically.The first step to predict the new sequences is determining the nucleotide or amino acid composition of them.In this paper,we apply natural vector method and convex hull principle to determine the nucleotide or amino acid composition of new genome or protein sequences.Our algorithm is based on optimization strategy.The SARS-CoV-2 genome and protein datasets are used to verify the feasibility of our algorithm.Numerical experiments show that our algorithm can detect and predict possible number of each nucleotide or amino acid of genome and protein sequence with respect to the second order natural vectors.展开更多
基金supported by National Natural Science Foundation of China(Grant No.11961141005)Tsinghua University Spring Breeze Fund(Grant No.2020 Z99CFY044)Tsinghua University start-up fund,and Tsinghua University Education Foundation fund(Grant No.042202008).
文摘Although with the continuous development of sequencing technology,the number of genome and protein sequences has grown rapidly,these sequences are only a small part of nature.Biologically,it is still a challenging and important problem to detect and predict some new genome or protein sequences based on real sequence data,which motivates us to solve the problem mathematically.The first step to predict the new sequences is determining the nucleotide or amino acid composition of them.In this paper,we apply natural vector method and convex hull principle to determine the nucleotide or amino acid composition of new genome or protein sequences.Our algorithm is based on optimization strategy.The SARS-CoV-2 genome and protein datasets are used to verify the feasibility of our algorithm.Numerical experiments show that our algorithm can detect and predict possible number of each nucleotide or amino acid of genome and protein sequence with respect to the second order natural vectors.