The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorith...The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content (G+C), GC content in first, second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions, levels of arginine codon usage (Arg2: AGA/G; Arg4: CGX), levels of amino acid usage and the entropy of amino acid content distribution. In archaeal genomes with strong GC pressure, arginine is coded preferably by GC-rich Arg4 codons, whereas in most of archaeal genomes with G+C〈0.6, arginine is coded preferably by AT-rich Arg2 codons. In the genome of Haloquadratum walsbyi, which is closely related to GC-rich archaea, GC content has decreased mostly in third codon positions, while Arg4〉〉Arg2 bias still persists. Proteomes of archaeal species carry characteristic amino acid biases: levels of isoleucine and lysine are elevated, while levels of alanine, histidine, glutamine and cytosine are relatively decreased. Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea.展开更多
Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, ...Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expres- sion breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is sig- nificantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic re- gions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addi- tion, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.展开更多
文摘The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content (G+C), GC content in first, second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions, levels of arginine codon usage (Arg2: AGA/G; Arg4: CGX), levels of amino acid usage and the entropy of amino acid content distribution. In archaeal genomes with strong GC pressure, arginine is coded preferably by GC-rich Arg4 codons, whereas in most of archaeal genomes with G+C〈0.6, arginine is coded preferably by AT-rich Arg2 codons. In the genome of Haloquadratum walsbyi, which is closely related to GC-rich archaea, GC content has decreased mostly in third codon positions, while Arg4〉〉Arg2 bias still persists. Proteomes of archaeal species carry characteristic amino acid biases: levels of isoleucine and lysine are elevated, while levels of alanine, histidine, glutamine and cytosine are relatively decreased. Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea.
基金supported by grants from the National Basic Research Program (973 Program 2006CB9-10401 and 2006CB910403) awarded to JY and SH
文摘Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expres- sion breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is sig- nificantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic re- gions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addi- tion, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.