摘要
The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content (G+C), GC content in first, second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions, levels of arginine codon usage (Arg2: AGA/G; Arg4: CGX), levels of amino acid usage and the entropy of amino acid content distribution. In archaeal genomes with strong GC pressure, arginine is coded preferably by GC-rich Arg4 codons, whereas in most of archaeal genomes with G+C〈0.6, arginine is coded preferably by AT-rich Arg2 codons. In the genome of Haloquadratum walsbyi, which is closely related to GC-rich archaea, GC content has decreased mostly in third codon positions, while Arg4〉〉Arg2 bias still persists. Proteomes of archaeal species carry characteristic amino acid biases: levels of isoleucine and lysine are elevated, while levels of alanine, histidine, glutamine and cytosine are relatively decreased. Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea.
The number of completely sequenced archaeal genomes has been sufficient for a large-scale bioinformatic study. We have conducted analyses for each coding region from 36 archaeal genomes using the original CGS algorithm by calculating the total GC content (G+C), GC content in first, second and third codon positions as well as in fourfold and twofold degenerated sites from third codon positions, levels of arginine codon usage (Arg2: AGA/G; Arg4: CGX), levels of amino acid usage and the entropy of amino acid content distribution. In archaeal genomes with strong GC pressure, arginine is coded preferably by GC-rich Arg4 codons, whereas in most of archaeal genomes with G+C〈0.6, arginine is coded preferably by AT-rich Arg2 codons. In the genome of Haloquadratum walsbyi, which is closely related to GC-rich archaea, GC content has decreased mostly in third codon positions, while Arg4〉〉Arg2 bias still persists. Proteomes of archaeal species carry characteristic amino acid biases: levels of isoleucine and lysine are elevated, while levels of alanine, histidine, glutamine and cytosine are relatively decreased. Numerous genomic and proteomic biases observed can be explained by the hypothesis of previously existed strong mutational AT pressure in the common predecessor of all archaea.