The material T240_6 derived from SC 2 young embryo of the combination CA9211/RW15 (6D/6V alien substitution) was telosomic substitution line of 6VS identified by GISH (genomic in situ hybridization) analysis. The 6V...The material T240_6 derived from SC 2 young embryo of the combination CA9211/RW15 (6D/6V alien substitution) was telosomic substitution line of 6VS identified by GISH (genomic in situ hybridization) analysis. The 6VS was microdissected with a needle and transferred into a 0.5 mL Ep tube. In the 'single tube', all the subsequence steps were conducted. After two round of LA (Linker adaptor)_PCR amplification, the size of PCR bands ranged from 100 to 3 000 bp, with predominate bands 600-1 500 bp. The products were confirmed by Southern blotting analysis using Haynaldia villosa (L.) Schur. genomic DNA labeled with 32 P as probe. The PCR products were purified and ligated into clone vector-pGEM_T easy vector. Then, the plasmids were transformed into competence E. coli JM109 with cool CaCl 2. It was estimated that there were more than 17 000 white clones in the library. The size of insert fragments distributed from 100-1 500 bp, with average of 600 bp. Using H. villosa genomic DNA as probe, dot blotting results showed that 37% clones displayed strong and medium positive signals, and 63% clones had faint or no signals. It is demonstrated that there were about 37% repeat sequence clones and 67% single/unique sequence clones in the library. Eight H. villosa_specific clones were screened from the library, and two clones pHVMK22 and pHVMK134 were used for RFLP analysis and sequencing. Both of them were H. villosa specific clones. The pHVMK22 was a unique sequence clone, and the pHVMK134 was a repeat sequence clone. When the pHVMK22 was used as a probe for Southern hybridization, all the powdery mildew resistance materials showed a special band of 2 kb, while all the susceptible ones not. The pHVMK22 may be applied to detect the existence of Pm21.展开更多
Recent work revealed that, in the genomes of polyploid wheat, there exists a class of low_copy and chromosome_specific sequences that are labile upon polyploid formation. This class of sequences was proposed to play ...Recent work revealed that, in the genomes of polyploid wheat, there exists a class of low_copy and chromosome_specific sequences that are labile upon polyploid formation. This class of sequences was proposed to play a critical role in the stabilization and establishment of nascent plant polyploids as new species. To further study this issue, five wheat chromosome 7B_specific sequences, isolated from common wheat (Triticum aestivum L.) by chromosome microdissection, were characterized. The sequences were studied by genomic Southern hybridizations on a collection of polyploid wheats and their diploid progenitors. Four sequences hybridized to all polyploid species, but at the diploid level to only species closely related to the B_genome of polyploid wheat. This indicates that these sequences originated with the divergence of the diploid species, and was then vertically transmitted to polyploids. One sequence hybridized to all species at both the diploid and polyploid levels, suggesting its elimination after the polyploid wheat formation. The hybridization of this sequence to two synthetic polyploid wheats indicated that sequence elimination is a rapid event and probably related to methylation status of the sequence. Based on the above results, we suggest that selective changes of low_copy sequences occur rapidly after polyploid formation, which may contribute to the differentiation of chromosomes in newly formed allopolyploid wheats.展开更多
[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side c...[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side chain radical in the amino acid,the classification information of amino acid which represented the sequence characteristic from the content and array situation of base was extracted from the different sequences that the amino acid content was different.The four-dimension vector was used to represent.Mahalanobis distance and Fisher discriminant methods were used to classify the given sequence.[Result] In the model,the back substitution rates of sample obtained by two kinds of classification methods were both 100%,and the consistent rate of classification was 90%.[Conclusion] In the model,the calculation method was simple,and the accuracy of classification result was higher.It was superior to the discriminant classification model which was only based on the base content.展开更多
Conogethes punctiferalis(Guenée)(Lepidoptera: Crambidae) was originally considered as one species with fruit-feeding type(FFT) and pinaceae-feeding type(PFT), but it has subsequently been divided into tw...Conogethes punctiferalis(Guenée)(Lepidoptera: Crambidae) was originally considered as one species with fruit-feeding type(FFT) and pinaceae-feeding type(PFT), but it has subsequently been divided into two different species of Conogethes punctiferalis and Conogethes pinicolalis. The relationship between the two species was investigated by phylogenetic reconstruction using maximum-likelihood(ML) parameter estimations. The phylogenetic tree and network were constructed based upon sequence data from concatenation of three genes of mitochondrial cytochrome c oxidase subunits I, II and cytochrome b which were derived from 118 samples of C. punctiferalis and 24 samples of C. pinicolalis. The phylogenetic tree and network showed that conspecific sequences were clustering together despite intraspecific variability. Here we report the results of a combined analysis of mitochondrial DNA sequences from three genes and morphological data representing powerful evidence that C. pinicolalisand C. punctiferalis are significantly different.展开更多
Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their ...Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.展开更多
In this paper, Adomian decomposition method (ADM) with high accuracy and fast convergence is introduced to solve the fractional-order piecewise-linear (PWL) hyperchaotic system. Based on the obtained hyperchaotic ...In this paper, Adomian decomposition method (ADM) with high accuracy and fast convergence is introduced to solve the fractional-order piecewise-linear (PWL) hyperchaotic system. Based on the obtained hyperchaotic sequences, a novel color image encryption algorithm is proposed by employing a hybrid model of bidirectional circular permutation and DNA masking. In this scheme, the pixel positions of image are scrambled by circular permutation, and the pixel values are substituted by DNA sequence operations. In the DNA sequence operations, addition and substraction operations are performed according to traditional addition and subtraction in the binary, and two rounds of addition rules are used to encrypt the pixel values. The simulation results and security analysis show that the hyperchaotic map is suitable for image encryption, and the proposed encryption algorithm has good encryption effect and strong key sensitivity. It can resist brute-force attack, statistical attack, differential attack, known-plaintext, and chosen-plaintext attacks.展开更多
Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomef...Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomefamous due to its high tendency towards convergence to the global optimummost of the time. But, still the standard bat with random walk has a problemof getting stuck in local minima. In order to solve this problem, this researchproposed bat algorithm with levy flight random walk. Then, the proposedBat with Levy flight algorithm is further hybridized with three differentvariants of ANN. The proposed BatLFBP is applied to the problem ofinsulin DNA sequence classification of healthy homosapien. For classificationperformance, the proposed models such as Bat levy flight Artificial NeuralNetwork (BatLFANN) and Bat levy Flight Back Propagation (BatLFBP) arecompared with the other state-of-the-art algorithms like Bat Artificial NeuralNetwork (BatANN), Bat back propagation (BatBP), Bat Gaussian distribution Artificial Neural Network (BatGDANN). And Bat Gaussian distributionback propagation (BatGDBP), in-terms of means squared error (MSE) andaccuracy. From the perspective of simulations results, it is show that theproposed BatLFANN achieved 99.88153% accuracy with MSE of 0.001185,and BatLFBP achieved 99.834185 accuracy with MSE of 0.001658 on WL5.While on WL10 the proposed BatLFANN achieved 99.89899% accuracy withMSE of 0.00101, and BatLFBP achieved 99.84473% accuracy with MSE of0.004553. Similarly, on WL15 the proposed BatLFANN achieved 99.82853%accuracy with MSE of 0.001715, and BatLFBP achieved 99.3262% accuracywith MSE of 0.006738 which achieve better accuracy as compared to the otherhybrid models.展开更多
In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we pr...In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model;therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.展开更多
Objective: To investigate the p21WAF1 /CIP1gene DNA sequence change and their relationship with the phenotype of human osteosarcoma. Methods: p21WAF1 /CIP1gene DNA of 36 osteosarcoma spec- ...Objective: To investigate the p21WAF1 /CIP1gene DNA sequence change and their relationship with the phenotype of human osteosarcoma. Methods: p21WAF1 /CIP1gene DNA of 36 osteosarcoma spec- imens was examined by using polymerase chain reaction-single strand conformation polymorphism (PCR- SSCP) method. The PCR products were sequenced directly. Results: In p21WAF1 /CIP1 gene exon3 of 36 cases of human osteosarcoma, the change of C→T in the p21WAF1 /CIP1gene CDNA sequence of position 609th occurred in 17 cases with the incidence being 44.4%. In 10 normal blood samples, DNA sequence analysis showed the change of C→T in the p21WAF1 /CIP1gene CDNA sequence of position 609th occurred in 8 cases with the incidence being 80%. Conclusion: The novel location of p21WAF1 /CIP1gene polymorphism of osteosarcoma, but not mutation was de?ned, and this location might provide the meaningful reference for the further research of p21WAF1/CIP1 gene.p2lWAF1/CIP1基因DNA序列分析及其与骨肉瘤表型的关系展开更多
A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the “fine structure” of nucleotide sequences. The approach is bas...A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the “fine structure” of nucleotide sequences. The approach is based on the assumption that nucleotides have an inner abstract characteristic, the determinative degree, which reflects genetic code phenomenological prop-erties and is adjusted to nucleotides physical properties. We consider each codon position independently, which gives three separate walks characterized by different angles and lengths, and that such an object is called triander which reflects the “strength” of branch. A general method for identifying DNA sequence “by triander” which can be treated as a unique “genogram” (or “gene passport”) is proposed. The two- and three-dimensional trianders are considered. The difference of sequences fine structure in genes and the intergenic space is shown. A clear triplet signal in coding sequences was found which is absent in the intergenic space and is independent from the sequence length. This paper presents the topological classification of trianders which can allow us to provide a detailed working out signatures of functionally different genomic regions.展开更多
Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF descr...Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by com paring input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.展开更多
Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks o...Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.展开更多
This paper investigates the existence of low-dimensional deterministic chaos in the AT and GC skew profiles of DNA sequences. It has taken DNA sequences from eight organisms as samples. The skew profiles are analysed ...This paper investigates the existence of low-dimensional deterministic chaos in the AT and GC skew profiles of DNA sequences. It has taken DNA sequences from eight organisms as samples. The skew profiles are analysed using continuous wavelet transform and then nonlinear time series methods. The invariant measures of correlation dimension and the largest Lyapunov exponent are calculated. It is demonstrated that the AT and GC skew profiles of these DNA sequences all exhibit low dimensional chaotic behaviour. It suggests that chaotic properties may be ubiquitous in the DNA sequences of all organisms.展开更多
In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the sc...In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the scaling behavior of P ( S ) ∝ e ?αS, where S represents nucleotide cluster size. The cluster-size distribution P(S1+S2) with the total size of sequential C-G cluster and A-T cluster S1+S2 were also studied. P(S1+S2) follows exponential decay. There does not exist the case of large C-G cluster following large A-T cluster or large A-T cluster following large C-G cluster. We also discuss the relatively random walk length function L(n) and the local compositional complexity of nucleotide sequences based on a new model. These investigations may provide some insight into nucleotide cluster of DNA sequence.展开更多
The univalent from the meiosis-metaphase spreads of F1 (Z2× wheat variety Wan7107) wasidentified to be Agropyrum intermedium 2Ai-2 chromosome by GISH. The 2Ai-2 chromosomes weremicroisolated and collected. After ...The univalent from the meiosis-metaphase spreads of F1 (Z2× wheat variety Wan7107) wasidentified to be Agropyrum intermedium 2Ai-2 chromosome by GISH. The 2Ai-2 chromosomes weremicroisolated and collected. After two rounds of PCR amplification, the PCR products wereranged from 150-3 000 bp,with predominant fragments at about 200-2 000 bp. Using Ag.intermedium genomic DNA as a probe, Southern blotting analysis confirmed the products originatedfrom Ag. intermedium genome. The products were purified, ligated to pUC18 and then transformedinto competence E.coli DH5αto produce a 2Ai-2 chromosome DNA library. The microcloningexperiments produced approximately 5 ×105 clones, the size range of the cloned inserts was 200-1 500 bp, with an average of 580 bp. Using Ag.intermedium genomic DNA as a probe, dot blottingresults showed that 56% clones are unique/low copy sequences, 44% are repetitive sequences inthe library. Four Ag. intermedium clones were screened from the library by RFLP, and threeclones(Mag065, Mag088, Mag139)belong to low/single sequences, one clone(Mag104)was repetitivesequence, and GISH results indicated that Mag104 was Ag.intermedium species-specific repetitiveDNA sequence.展开更多
The characterization of long-range correlations and fractal properties of DNA sequences has proved to be adifficult though rewarding task mainly due to the mosaic character of DNA consisting of many patches of various...The characterization of long-range correlations and fractal properties of DNA sequences has proved to be adifficult though rewarding task mainly due to the mosaic character of DNA consisting of many patches of various lengthswith different nucleotide constitutions.In this paper we investigate statistical correlations among different positions in DNAsequences using the two-dimensional DNA walk.The root-mean-square fluctuation F(l)is described by a power law.Theautocorrelation function C(l),which is used to measure the linear dependence and periodicity,exists a power law ofC(l)-l^(-μ).We also calculate the mean-square distance<R^2(l)>along the DNA chain,and it may be expressed as<R^2(l)>-l^(?)with 2>γ>1.Our investigations can provide some insights into long-range correlations in DNA sequences.展开更多
This paper presents a model to describe alternating current (AC) conductivity of DNA sequences, in which DNA is considered as a one-dimensional (1D) disordered system, and electrons transport via hopping between l...This paper presents a model to describe alternating current (AC) conductivity of DNA sequences, in which DNA is considered as a one-dimensional (1D) disordered system, and electrons transport via hopping between localized states. It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises, and it takes the form of σac(ω) - ω2 ln^2(1/ω). Also AC conductivity of DNA sequences increases with the increase of temperature, this phenomenon presents characteristics of weak temperature-dependence. Meanwhile, the AC conductivity in an offdiagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures, which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity, while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition, the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences. For p 〈 0.5, the conductivity of DNA sequence decreases with the increase of p, while for p ≥ 0.5, the conductivity increases with the increase of p.展开更多
Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy...Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.展开更多
This technical note aims to show how any instructor teaching entomology can use the Basic Local Alignment Search Tool (BLAST) and the “one click” mode of Phylogeny.fr to teach undergraduate students about insect DNA...This technical note aims to show how any instructor teaching entomology can use the Basic Local Alignment Search Tool (BLAST) and the “one click” mode of Phylogeny.fr to teach undergraduate students about insect DNA similarity in a simple way. Teaching an entomology course requires the use of numerous tools to help students grasp different concepts. Knowing that there are more than one million described species of insects means that teaching students about insect identification and taxonomy can be challenging. However, here we present two easy exercises that could be used as classroom or </span><span style="font-family:Verdana;">take-home assignments to demonstrate various levels of DNA similarity</span><span style="font-family:Verdana;"> among different insect taxa. Such exercises unlock students’ creativity and break the barrier of fear of bioinformatics. Moreover, they open up new ways for them to understand insect taxonomy through molecular biology and allow them to develop new skills that contribute to strengthening their scientific performance in the future, especially when they do research as graduate students. </span><span style="font-family:Verdana;">Finally, this note is an example of how to integrate simple bioinformatics </span><span style="font-family:Verdana;">tools into the teaching of entomology.展开更多
基金国家"8 6 3"计划资助项目 (Z 17 0 4 0 1) 国家转基因植物研究与产业化资助项目 (J0 0 A 0 0 2 )~~
文摘The material T240_6 derived from SC 2 young embryo of the combination CA9211/RW15 (6D/6V alien substitution) was telosomic substitution line of 6VS identified by GISH (genomic in situ hybridization) analysis. The 6VS was microdissected with a needle and transferred into a 0.5 mL Ep tube. In the 'single tube', all the subsequence steps were conducted. After two round of LA (Linker adaptor)_PCR amplification, the size of PCR bands ranged from 100 to 3 000 bp, with predominate bands 600-1 500 bp. The products were confirmed by Southern blotting analysis using Haynaldia villosa (L.) Schur. genomic DNA labeled with 32 P as probe. The PCR products were purified and ligated into clone vector-pGEM_T easy vector. Then, the plasmids were transformed into competence E. coli JM109 with cool CaCl 2. It was estimated that there were more than 17 000 white clones in the library. The size of insert fragments distributed from 100-1 500 bp, with average of 600 bp. Using H. villosa genomic DNA as probe, dot blotting results showed that 37% clones displayed strong and medium positive signals, and 63% clones had faint or no signals. It is demonstrated that there were about 37% repeat sequence clones and 67% single/unique sequence clones in the library. Eight H. villosa_specific clones were screened from the library, and two clones pHVMK22 and pHVMK134 were used for RFLP analysis and sequencing. Both of them were H. villosa specific clones. The pHVMK22 was a unique sequence clone, and the pHVMK134 was a repeat sequence clone. When the pHVMK22 was used as a probe for Southern hybridization, all the powdery mildew resistance materials showed a special band of 2 kb, while all the susceptible ones not. The pHVMK22 may be applied to detect the existence of Pm21.
文摘Recent work revealed that, in the genomes of polyploid wheat, there exists a class of low_copy and chromosome_specific sequences that are labile upon polyploid formation. This class of sequences was proposed to play a critical role in the stabilization and establishment of nascent plant polyploids as new species. To further study this issue, five wheat chromosome 7B_specific sequences, isolated from common wheat (Triticum aestivum L.) by chromosome microdissection, were characterized. The sequences were studied by genomic Southern hybridizations on a collection of polyploid wheats and their diploid progenitors. Four sequences hybridized to all polyploid species, but at the diploid level to only species closely related to the B_genome of polyploid wheat. This indicates that these sequences originated with the divergence of the diploid species, and was then vertically transmitted to polyploids. One sequence hybridized to all species at both the diploid and polyploid levels, suggesting its elimination after the polyploid wheat formation. The hybridization of this sequence to two synthetic polyploid wheats indicated that sequence elimination is a rapid event and probably related to methylation status of the sequence. Based on the above results, we suggest that selective changes of low_copy sequences occur rapidly after polyploid formation, which may contribute to the differentiation of chromosomes in newly formed allopolyploid wheats.
基金Supported by Science Research Project of Ningbo Dahongying University in2011(CF102601)~~
文摘[Objective] The research aimed to construct the discriminant classification model of DNA sequence by combining with the biology knowledge and the mathematical method.[Method] According to the polarity nature of side chain radical in the amino acid,the classification information of amino acid which represented the sequence characteristic from the content and array situation of base was extracted from the different sequences that the amino acid content was different.The four-dimension vector was used to represent.Mahalanobis distance and Fisher discriminant methods were used to classify the given sequence.[Result] In the model,the back substitution rates of sample obtained by two kinds of classification methods were both 100%,and the consistent rate of classification was 90%.[Conclusion] In the model,the calculation method was simple,and the accuracy of classification result was higher.It was superior to the discriminant classification model which was only based on the base content.
基金supported by China Agriculture Research System(CARS-02)Beijing Municipal Sci-Tech Program(Z111100056811009)
文摘Conogethes punctiferalis(Guenée)(Lepidoptera: Crambidae) was originally considered as one species with fruit-feeding type(FFT) and pinaceae-feeding type(PFT), but it has subsequently been divided into two different species of Conogethes punctiferalis and Conogethes pinicolalis. The relationship between the two species was investigated by phylogenetic reconstruction using maximum-likelihood(ML) parameter estimations. The phylogenetic tree and network were constructed based upon sequence data from concatenation of three genes of mitochondrial cytochrome c oxidase subunits I, II and cytochrome b which were derived from 118 samples of C. punctiferalis and 24 samples of C. pinicolalis. The phylogenetic tree and network showed that conspecific sequences were clustering together despite intraspecific variability. Here we report the results of a combined analysis of mitochondrial DNA sequences from three genes and morphological data representing powerful evidence that C. pinicolalisand C. punctiferalis are significantly different.
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University,China (Grant No 20070365)
文摘Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
基金supported by the National Natural Science Foundation of China(Grant Nos.61161006 and 61573383)
文摘In this paper, Adomian decomposition method (ADM) with high accuracy and fast convergence is introduced to solve the fractional-order piecewise-linear (PWL) hyperchaotic system. Based on the obtained hyperchaotic sequences, a novel color image encryption algorithm is proposed by employing a hybrid model of bidirectional circular permutation and DNA masking. In this scheme, the pixel positions of image are scrambled by circular permutation, and the pixel values are substituted by DNA sequence operations. In the DNA sequence operations, addition and substraction operations are performed according to traditional addition and subtraction in the binary, and two rounds of addition rules are used to encrypt the pixel values. The simulation results and security analysis show that the hyperchaotic map is suitable for image encryption, and the proposed encryption algorithm has good encryption effect and strong key sensitivity. It can resist brute-force attack, statistical attack, differential attack, known-plaintext, and chosen-plaintext attacks.
基金This research is supported by Tier-1 Research Grant, vote no. H938 by ResearchManagement Office (RMC), Universiti Tun Hussein Onn Malaysia and Ministry of Higher Education,Malaysia.
文摘Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomefamous due to its high tendency towards convergence to the global optimummost of the time. But, still the standard bat with random walk has a problemof getting stuck in local minima. In order to solve this problem, this researchproposed bat algorithm with levy flight random walk. Then, the proposedBat with Levy flight algorithm is further hybridized with three differentvariants of ANN. The proposed BatLFBP is applied to the problem ofinsulin DNA sequence classification of healthy homosapien. For classificationperformance, the proposed models such as Bat levy flight Artificial NeuralNetwork (BatLFANN) and Bat levy Flight Back Propagation (BatLFBP) arecompared with the other state-of-the-art algorithms like Bat Artificial NeuralNetwork (BatANN), Bat back propagation (BatBP), Bat Gaussian distribution Artificial Neural Network (BatGDANN). And Bat Gaussian distributionback propagation (BatGDBP), in-terms of means squared error (MSE) andaccuracy. From the perspective of simulations results, it is show that theproposed BatLFANN achieved 99.88153% accuracy with MSE of 0.001185,and BatLFBP achieved 99.834185 accuracy with MSE of 0.001658 on WL5.While on WL10 the proposed BatLFANN achieved 99.89899% accuracy withMSE of 0.00101, and BatLFBP achieved 99.84473% accuracy with MSE of0.004553. Similarly, on WL15 the proposed BatLFANN achieved 99.82853%accuracy with MSE of 0.001715, and BatLFBP achieved 99.3262% accuracywith MSE of 0.006738 which achieve better accuracy as compared to the otherhybrid models.
文摘In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model;therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.
文摘Objective: To investigate the p21WAF1 /CIP1gene DNA sequence change and their relationship with the phenotype of human osteosarcoma. Methods: p21WAF1 /CIP1gene DNA of 36 osteosarcoma spec- imens was examined by using polymerase chain reaction-single strand conformation polymorphism (PCR- SSCP) method. The PCR products were sequenced directly. Results: In p21WAF1 /CIP1 gene exon3 of 36 cases of human osteosarcoma, the change of C→T in the p21WAF1 /CIP1gene CDNA sequence of position 609th occurred in 17 cases with the incidence being 44.4%. In 10 normal blood samples, DNA sequence analysis showed the change of C→T in the p21WAF1 /CIP1gene CDNA sequence of position 609th occurred in 8 cases with the incidence being 80%. Conclusion: The novel location of p21WAF1 /CIP1gene polymorphism of osteosarcoma, but not mutation was de?ned, and this location might provide the meaningful reference for the further research of p21WAF1/CIP1 gene.p2lWAF1/CIP1基因DNA序列分析及其与骨肉瘤表型的关系
文摘A new version of DNA walks, where nucleotides are regarded unequal in their contribution to a walk is introduced, which allows us to study thoroughly the “fine structure” of nucleotide sequences. The approach is based on the assumption that nucleotides have an inner abstract characteristic, the determinative degree, which reflects genetic code phenomenological prop-erties and is adjusted to nucleotides physical properties. We consider each codon position independently, which gives three separate walks characterized by different angles and lengths, and that such an object is called triander which reflects the “strength” of branch. A general method for identifying DNA sequence “by triander” which can be treated as a unique “genogram” (or “gene passport”) is proposed. The two- and three-dimensional trianders are considered. The difference of sequences fine structure in genes and the intergenic space is shown. A clear triplet signal in coding sequences was found which is absent in the intergenic space and is independent from the sequence length. This paper presents the topological classification of trianders which can allow us to provide a detailed working out signatures of functionally different genomic regions.
文摘Active Motif Finder (AMF) is a novel algorithmic tool, designed based on mutations in DNA sequences. Tools available at present for finding motifs are based on matching a given motif in the query sequence. AMF describes a new algorithm that identifies the occurrences of patterns which possess all kinds of mutations like insertion, deletion and mismatch. The algorithm is mainly based on the Alignment Score Matrix (ASM) computation by com paring input motif with full length sequence. Much of the effort in bioinformatics is directed to identify these motifs in the sequences of newly discovered genes. The proposed bio-tool serves as an open resource for analysis and useful for studying polymorphisms in DNA sequences. AMF can be searched via a user-friendly interface. This tool is intended to serve the scientific community working in the areas of chemical and structural biology, and is freely available to all users, at http://www.sastra.edu/scbt/amf/.
基金Project supported by the National Natural Science Foundation ofChina (Nos. 20174036 20274040)+2 种基金 and the Natural Science Founda-tion of Zhejiang Province (Nos. R404047 10102) China
文摘Using the complete genome of Plasmodium falciparum 3D7 which has 14 chromosomes as an example, we have examined the distribution functions for the amount of C or G and A or T consecutively and non-overlapping blocks of m bases in this system. The function P(S) about the number of the consecutive C-G or A-T content cluster conforms to the relation P(S)∝e? ; αs values of the scaling exponent αCG are much larger than αAT; and αAT of 14 chromosomes are hardly changed, whereas αCG of 14 chromosomes have a number of fluctuations. We found maximum value of A-T cluster size is much larger than C-G, which implies the existence of large A-T cluster. Our study of the width function ξ(m) of cluster C-G content showed that follows good power law ξ(m)∝m?γ. The average γ for 14 chromosomes is 0.931. These investigations provide some insight into the nucleotide clusters of DNA sequences, and help us understand other properties of DNA sequences.
基金supported in part by the National Natural Science Foundation of China (Grant No.60774088)the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No.20090031110029)the Foundation of the Application Base and Frontier Technology Research Project of Tianjin (Grant No.08JCZDJC21900)
文摘This paper investigates the existence of low-dimensional deterministic chaos in the AT and GC skew profiles of DNA sequences. It has taken DNA sequences from eight organisms as samples. The skew profiles are analysed using continuous wavelet transform and then nonlinear time series methods. The invariant measures of correlation dimension and the largest Lyapunov exponent are calculated. It is demonstrated that the AT and GC skew profiles of these DNA sequences all exhibit low dimensional chaotic behaviour. It suggests that chaotic properties may be ubiquitous in the DNA sequences of all organisms.
基金Project supported by the National Natural Science Foundation of China (No. 20574052)Program for New Century Excellent Talents in University,and the Natural Science Foundation of Zhejiang Prov-ince (Nos. R404047 and Y405011),China
文摘In this paper we study the scaling behavior of nucleotide cluster in 11 chromosomes of Encephalitozoon cuniculi Genome. The statistical distribution of nucleotide clusters for 11 chromosomes is characterized by the scaling behavior of P ( S ) ∝ e ?αS, where S represents nucleotide cluster size. The cluster-size distribution P(S1+S2) with the total size of sequential C-G cluster and A-T cluster S1+S2 were also studied. P(S1+S2) follows exponential decay. There does not exist the case of large C-G cluster following large A-T cluster or large A-T cluster following large C-G cluster. We also discuss the relatively random walk length function L(n) and the local compositional complexity of nucleotide sequences based on a new model. These investigations may provide some insight into nucleotide cluster of DNA sequence.
基金supported by National High-Tech R&D(863)ProgramNational Natural Science Foundation of China(101-04-03-03-97).
文摘The univalent from the meiosis-metaphase spreads of F1 (Z2× wheat variety Wan7107) wasidentified to be Agropyrum intermedium 2Ai-2 chromosome by GISH. The 2Ai-2 chromosomes weremicroisolated and collected. After two rounds of PCR amplification, the PCR products wereranged from 150-3 000 bp,with predominant fragments at about 200-2 000 bp. Using Ag.intermedium genomic DNA as a probe, Southern blotting analysis confirmed the products originatedfrom Ag. intermedium genome. The products were purified, ligated to pUC18 and then transformedinto competence E.coli DH5αto produce a 2Ai-2 chromosome DNA library. The microcloningexperiments produced approximately 5 ×105 clones, the size range of the cloned inserts was 200-1 500 bp, with an average of 580 bp. Using Ag.intermedium genomic DNA as a probe, dot blottingresults showed that 56% clones are unique/low copy sequences, 44% are repetitive sequences inthe library. Four Ag. intermedium clones were screened from the library by RFLP, and threeclones(Mag065, Mag088, Mag139)belong to low/single sequences, one clone(Mag104)was repetitivesequence, and GISH results indicated that Mag104 was Ag.intermedium species-specific repetitiveDNA sequence.
基金This work was financially support by the National Natural Science Foundation of China(Nos.29874012,20174036,20274040)Natural Science Foundation of Zhejiang Province(No.10102).
文摘The characterization of long-range correlations and fractal properties of DNA sequences has proved to be adifficult though rewarding task mainly due to the mosaic character of DNA consisting of many patches of various lengthswith different nucleotide constitutions.In this paper we investigate statistical correlations among different positions in DNAsequences using the two-dimensional DNA walk.The root-mean-square fluctuation F(l)is described by a power law.Theautocorrelation function C(l),which is used to measure the linear dependence and periodicity,exists a power law ofC(l)-l^(-μ).We also calculate the mean-square distance<R^2(l)>along the DNA chain,and it may be expressed as<R^2(l)>-l^(?)with 2>γ>1.Our investigations can provide some insights into long-range correlations in DNA sequences.
基金supported by the Doctoral Program Foundation of Institutions of Higher Education,China (Grant No 20070533075)
文摘This paper presents a model to describe alternating current (AC) conductivity of DNA sequences, in which DNA is considered as a one-dimensional (1D) disordered system, and electrons transport via hopping between localized states. It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises, and it takes the form of σac(ω) - ω2 ln^2(1/ω). Also AC conductivity of DNA sequences increases with the increase of temperature, this phenomenon presents characteristics of weak temperature-dependence. Meanwhile, the AC conductivity in an offdiagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures, which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity, while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition, the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences. For p 〈 0.5, the conductivity of DNA sequence decreases with the increase of p, while for p ≥ 0.5, the conductivity increases with the increase of p.
基金the Special Funds for Major National Basic Research Projects,国家自然科学基金
文摘Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.
文摘This technical note aims to show how any instructor teaching entomology can use the Basic Local Alignment Search Tool (BLAST) and the “one click” mode of Phylogeny.fr to teach undergraduate students about insect DNA similarity in a simple way. Teaching an entomology course requires the use of numerous tools to help students grasp different concepts. Knowing that there are more than one million described species of insects means that teaching students about insect identification and taxonomy can be challenging. However, here we present two easy exercises that could be used as classroom or </span><span style="font-family:Verdana;">take-home assignments to demonstrate various levels of DNA similarity</span><span style="font-family:Verdana;"> among different insect taxa. Such exercises unlock students’ creativity and break the barrier of fear of bioinformatics. Moreover, they open up new ways for them to understand insect taxonomy through molecular biology and allow them to develop new skills that contribute to strengthening their scientific performance in the future, especially when they do research as graduate students. </span><span style="font-family:Verdana;">Finally, this note is an example of how to integrate simple bioinformatics </span><span style="font-family:Verdana;">tools into the teaching of entomology.