摘要
基因重复是普遍存在的现象,与基因组进化密切相关,是基因组和遗传系统分化的重要推动力.目前针对原核基因组中蛋白质编码基因序列中的重复基因的系统研究还很少.本文以四种具有不同GC%含量的原核生物基因组为研究对象,用CodonW软件对各基因组中完全相同的功能基因的密码子使用偏好进行分析,用CD-hit软件对各基因组中以80%为阈值的重复蛋白编码基因进行分析.结果表明四个基因组的蛋白编码基因中普遍存在基因重复序列,其比例占到2.77%~7.03%.对序列完全相同的功能已知基因的分析表明其序列长度分布在50bp到1000bp左右的范围,多数长度在500bp以下;功能分析表明所研究基因组中大部分重复基因与转座酶有关,还有少量的编码转移酶、水解酶、跨膜蛋白、阻遏蛋白等.对各基因组中重复基因中序列完全相同的基因的密码子偏好性分析表明这些多拷贝基因坐落在基因组中某一特定区域并集中分布,展现出明显的共性特征.本文的尝试性工作将为今后原核基因组研究提供新思路.
Gene duplication is a general phenomenon in organism,which is related to the genome evolution as an important driving force of genome and genetic differentiation system.At present,much fewer re-searches have been performed on the duplicated genes in prokaryotic genomes.Four prokaryotic genomes with different GC contents are downloaded from Refseq database.CodonW program is adopted for codon usage analysis of the protein coding genes.CD-hit program is used to determine the duplicated genes with the threshold of 80%.Statistical results show that 2.77%~7.03% of the protein coding genes in the four genomes are duplicated.Further sequences analysis shows that sequence length of the multi-copied known function genes are below 1000bp.Function analysis showed that most of the multi-copied genes are related to transposons,with a small amount of genes that coding transferase,hydrolytic enzymes, transmembrane protein,repressor protein,etc.Codon usage bias analysis indicates that the most of the multi-copied genes locate in particular regions,which exhibit regular intrinsic sequences features.Then it is interesting for further study the evolutionary mechanisms of the multi-copied genes in future work.
出处
《德州学院学报》
2014年第6期21-25,51,共6页
Journal of Dezhou University
基金
国家自然科学基金资助项目(61302186)
关键词
原核基因组
重复基因
多拷贝蛋白编码基因
prokaryotic genome duplicated gene multi-copied protein coding genes