Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their ...Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.展开更多
A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the ne...A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.展开更多
Over the course of human history, influenza pandemics have been seen as major disasters, so studies on the influenza virus have become an important issue for many experts and scholars. Comprehensive research has been ...Over the course of human history, influenza pandemics have been seen as major disasters, so studies on the influenza virus have become an important issue for many experts and scholars. Comprehensive research has been performed over the years on the biological properties, chemical characteristics, external environmental factors and other aspects of the virus, and some results have been achieved. Based on the chaos game representation walk model, this paper uses the time series analysis method to study the DNA sequences of the influenza virus from 1913 to 2010, and works out the early-warning signals indicator value for the outbreak of an influenza pandemic. The variances in the CCR wall〈 sequences for the pandemic years (or + -1 to 2 years) are significantly higher than those for the adjacent years, while those in the non-pandemic years are usually smaller. In this way we can provide an influenza early-warning mechanism so that people can take precautions and be well prepared prior to a pandemic.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University,China (Grant No 20070365)
文摘Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University, China (Grant No 20070365)the Program for Innovative Research Team of Jiangnan University, China
文摘A new chaos game representation of protein sequences based on the detailed hydrophobic-hydrophilic (HP) model has been proposed by Yu et al (Physica A 337(2004) 171). A CGR-walk model is proposed based on the new CGR coordinates for the protein sequences from complete genomes in the present paper. The new CCR coordinates based on the detailed HP model are converted into a time series, and a long-memory ARFIMA(p, d, q) model is introduced into the protein sequence analysis. This model is applied to simulating real CCR-walk sequence data of twelve protein sequences. Remarkably long-range correlations are uncovered in the data and the results obtained from these models are reasonably consistent with those available from the ARFIMA(p, d, q) model.
基金Project supported by the Fundamental Research Funds for the Central Universities (Grant No. JUSRP21117)the Program for Innovative Research Team of Jiangnan University (Grant No. 2008CX002)
文摘Over the course of human history, influenza pandemics have been seen as major disasters, so studies on the influenza virus have become an important issue for many experts and scholars. Comprehensive research has been performed over the years on the biological properties, chemical characteristics, external environmental factors and other aspects of the virus, and some results have been achieved. Based on the chaos game representation walk model, this paper uses the time series analysis method to study the DNA sequences of the influenza virus from 1913 to 2010, and works out the early-warning signals indicator value for the outbreak of an influenza pandemic. The variances in the CCR wall〈 sequences for the pandemic years (or + -1 to 2 years) are significantly higher than those for the adjacent years, while those in the non-pandemic years are usually smaller. In this way we can provide an influenza early-warning mechanism so that people can take precautions and be well prepared prior to a pandemic.