[Objective] The sequences of mitochondrial DNA D-loop region of Xinjiang Goose with three different colors of plumage were analyzed in order to study the genetic diversity of Xinjiang Goose, as well as the phylogeny a...[Objective] The sequences of mitochondrial DNA D-loop region of Xinjiang Goose with three different colors of plumage were analyzed in order to study the genetic diversity of Xinjiang Goose, as well as the phylogeny and evolution. [Method] Ten geese were selected randomly from the core populations of grey-, mosaic- and white-plumaged Xinjiang Goose respectively with a total number of thirty as experi- mental materials, of which the blood samples were collected from the largest vein under the wing (brachial vein) for DNA extraction. Sequences of mitochondrial DNA D-loop regions were determined using DNA sequencing technology to analyze the polymorphism. In addition, the genetic distances among different populations were estimated through the comparison with the reference sequences. [Resull] The con- tents of A, G, C and T nucleotides in the D-loop region of Xinjiang Goose were 28.85%, 17.05%, 25.38% and 28.72%, respectively. The average haplotype diversity and nucleotide diversity of Xinjiang Goose were 0.583 and 0.056. Xinjiang Goose and Greylag Goose were clustered into the same group. [Conclusion] The results showed that Xinjiang Geese with three different colors of plumage all descend from Greylag Goose (Anser anser).展开更多
Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their ...Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.展开更多
Stream cipher, DNA cryptography and DNA analysis are the most important R&D fields in both Cryptography and Bioinformatics. HC-256 is an emerged scheme as the new generation of stream ciphers for advanced network ...Stream cipher, DNA cryptography and DNA analysis are the most important R&D fields in both Cryptography and Bioinformatics. HC-256 is an emerged scheme as the new generation of stream ciphers for advanced network security. From a random sequencing viewpoint, both sequences of HC-256 and real DNA data may have intrinsic pseudo-random properties respectively. In a recent decade, many DNA sequencing projects are developed on cells, plants and animals over the world into huge DNA databases. Researchers notice that mammalian genomes encode thousands of large noncoding RNAs (lncRNAs), interact with chromatin regulatory complexes, and are thought to play a role in localizing these complexes to target loci across the genome. It is a challenge target using higher dimensional visualization tools to organize various complex interactive properties as visual maps. The Variant Map System (VMS) as an emerging scheme is systematically proposed in this paper to apply multiple maps that used four Meta symbols as same as DNA or RNA representations. System architecture of key components and core mechanism on the VMS are described. Key modules, equations and their I/O parameters are discussed. Applying the VM System, two sets of real DNA sequences from both sample human (noncoding DNA) and corn (coding DNA) genomes are collected in comparison with pseudo DNA sequences generated by HC-256 to show their intrinsic properties in higher levels of similar relationships among relevant DNA sequences on 2D maps. Sample 2D maps are listed and their characteristics are illustrated under controllable environment. Visual results are briefly analyzed to explore their intrinsic properties on selected genome sequences.展开更多
Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks o...Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.展开更多
基金Supported by the Fond for Open Projects of Xinjiang Key Laboratory of Herbivore Nutrition for Meat&Milk Production~~
文摘[Objective] The sequences of mitochondrial DNA D-loop region of Xinjiang Goose with three different colors of plumage were analyzed in order to study the genetic diversity of Xinjiang Goose, as well as the phylogeny and evolution. [Method] Ten geese were selected randomly from the core populations of grey-, mosaic- and white-plumaged Xinjiang Goose respectively with a total number of thirty as experi- mental materials, of which the blood samples were collected from the largest vein under the wing (brachial vein) for DNA extraction. Sequences of mitochondrial DNA D-loop regions were determined using DNA sequencing technology to analyze the polymorphism. In addition, the genetic distances among different populations were estimated through the comparison with the reference sequences. [Resull] The con- tents of A, G, C and T nucleotides in the D-loop region of Xinjiang Goose were 28.85%, 17.05%, 25.38% and 28.72%, respectively. The average haplotype diversity and nucleotide diversity of Xinjiang Goose were 0.583 and 0.056. Xinjiang Goose and Greylag Goose were clustered into the same group. [Conclusion] The results showed that Xinjiang Geese with three different colors of plumage all descend from Greylag Goose (Anser anser).
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University,China (Grant No 20070365)
文摘Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
文摘Stream cipher, DNA cryptography and DNA analysis are the most important R&D fields in both Cryptography and Bioinformatics. HC-256 is an emerged scheme as the new generation of stream ciphers for advanced network security. From a random sequencing viewpoint, both sequences of HC-256 and real DNA data may have intrinsic pseudo-random properties respectively. In a recent decade, many DNA sequencing projects are developed on cells, plants and animals over the world into huge DNA databases. Researchers notice that mammalian genomes encode thousands of large noncoding RNAs (lncRNAs), interact with chromatin regulatory complexes, and are thought to play a role in localizing these complexes to target loci across the genome. It is a challenge target using higher dimensional visualization tools to organize various complex interactive properties as visual maps. The Variant Map System (VMS) as an emerging scheme is systematically proposed in this paper to apply multiple maps that used four Meta symbols as same as DNA or RNA representations. System architecture of key components and core mechanism on the VMS are described. Key modules, equations and their I/O parameters are discussed. Applying the VM System, two sets of real DNA sequences from both sample human (noncoding DNA) and corn (coding DNA) genomes are collected in comparison with pseudo DNA sequences generated by HC-256 to show their intrinsic properties in higher levels of similar relationships among relevant DNA sequences on 2D maps. Sample 2D maps are listed and their characteristics are illustrated under controllable environment. Visual results are briefly analyzed to explore their intrinsic properties on selected genome sequences.
基金Project supported by the National Natural Science Foundation ofChina (Nos. 20174036 20274040)+2 种基金 and the Natural Science Founda-tion of Zhejiang Province (Nos. R404047 10102) China
文摘Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.