To improve the performance of Saitou and Nei's algorithm (SN) and Studier and Keppler's improved algorithm (SK) for constructing neighbor-joining phylogenetic trees and reduce the time complexity of the computat...To improve the performance of Saitou and Nei's algorithm (SN) and Studier and Keppler's improved algorithm (SK) for constructing neighbor-joining phylogenetic trees and reduce the time complexity of the computation, a fast algorithm is proposed. The proposed algorithm includes three techniques. First, a linear array A[N] is introduced to store the sum of every row of the distance matrix (the same as SK), which can eliminate many repeated computations. Secondly, the value of A [i] is computed only once at the beginning of the algorithm, and is updated by three elements in the iteration. Thirdly, a very compact formula for the sum of all the branch lengths of operational taxonomic units (OTUs) i and j is designed, and the correctness of the formula is proved. The experimental results show that the proposed algorithm is from tens to hundreds times faster than SN and roughly two times faster than SK when N increases, constructing a tree with 2 000 OTUs in 3 min on a current desktop computer. To earn the time with the cost of the space and reduce the computations in the innermost loop are the basic solutions for algorithms with many loops.展开更多
[Objective] The molecular weight,isoelectric point,signal peptide,domain and other properties of the encoding protein of the known cystatin genes were analyzed.[Method] Cystatin genes were searched in NCBI and the rel...[Objective] The molecular weight,isoelectric point,signal peptide,domain and other properties of the encoding protein of the known cystatin genes were analyzed.[Method] Cystatin genes were searched in NCBI and the related amino acids sequences were downloaded.SMART software was used to predict the domain.SingalP program was used to search signal peptide.TMHMM program was used to search and predict the transmembrane domain.CLUSTAL W program was used to make multiple sequence alignment.Using MEGA3.1 software,...展开更多
Degenerate primers are particularly useful in amplifying homologous genes from different organisms. This paper describes a method for designing degenerate primers for a given multiple alignment of DNA sequences of Hsp...Degenerate primers are particularly useful in amplifying homologous genes from different organisms. This paper describes a method for designing degenerate primers for a given multiple alignment of DNA sequences of Hsp70 gene family using ClustalW algorithm. The authors used an in silico approach to find a homology between more than one accession numbers of DNA sequences, X67711.2 was for Oryza sativa Hsp70, AY372071.1 was for Nicotiana tabacum Hsp70 and L41253.2 was for Lycopersicon esculentum Hsc70. The three accession numbers which were retrieved by the BLASTn program depend on their expected value (E-value). Multiple sequence alignment was performed by ClustalW algorithm to produce a conserved blocks and determine the consensus region which had been used to produce the forward and reverse primer by the primer select module of DNAStar Lasergene V7 and In-Silco PCR module of FASTPCR program ver.4.0.8 was performed to detect the melting temperatures (Tm) and predict the PCR product size, The results of designed degenerate primer showed that there was a homology found between the designed primers and the DNA templates for the three accession numbers with at least 80% identity. The result of degenerate PCR showed that the three bands of the amplified PCR products of the three accession numbers were detected at the same molecular weight of marker (400 bp) with a difference about 15 pb compared to the in silco PCR product (385 pb). In conclusion, this study focused on the importance of using the clustalW algorithm for designing the degenerate primer.展开更多
Bacterial diversity of 14 sites of the East China Sea was investigated by culture-dependent methods. The impact of human activities on marine bacteria was primarily studied and characteristics of bacteria communities ...Bacterial diversity of 14 sites of the East China Sea was investigated by culture-dependent methods. The impact of human activities on marine bacteria was primarily studied and characteristics of bacteria communities in different areas were analyzed. A total of 396 strains were obtained. These strains belong to 4 phyla, 9 classes and 146 species according to 16S rDNA sequences alignment. For 32 strains, the 16S rDNA sequences similarities between isolated strains and their most closely related species were lower than 98%. The result indicated that there are abundant microbial diversity and a large number of unknown microbial resources in the East China Sea. Isolated strains were dominated byy-proteobacteria (64%), ct-proteobacteria (18%) and Firmicutes (15%). Actinobacteria and Bacteroidetes were less than 3%. Microbial community composition, diversity and abundance among areas with varies distances from land were different. The far the regions from the land, the lower the Shannon index (H') and the Margalef index (DMg) values were.展开更多
Although high quality multiple sequence alignment is an essential task in bioinforma- tics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and ...Although high quality multiple sequence alignment is an essential task in bioinforma- tics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and space phase is the distance matrix computation. This paper addresses this issue by proposing a vectorized parallel method that accomplishes the huge number of similarity comparisons faster in less space. Per- formance tests on real biological datasets using core-iT show superior results in terms of time and space.展开更多
文摘To improve the performance of Saitou and Nei's algorithm (SN) and Studier and Keppler's improved algorithm (SK) for constructing neighbor-joining phylogenetic trees and reduce the time complexity of the computation, a fast algorithm is proposed. The proposed algorithm includes three techniques. First, a linear array A[N] is introduced to store the sum of every row of the distance matrix (the same as SK), which can eliminate many repeated computations. Secondly, the value of A [i] is computed only once at the beginning of the algorithm, and is updated by three elements in the iteration. Thirdly, a very compact formula for the sum of all the branch lengths of operational taxonomic units (OTUs) i and j is designed, and the correctness of the formula is proved. The experimental results show that the proposed algorithm is from tens to hundreds times faster than SN and roughly two times faster than SK when N increases, constructing a tree with 2 000 OTUs in 3 min on a current desktop computer. To earn the time with the cost of the space and reduce the computations in the innermost loop are the basic solutions for algorithms with many loops.
文摘[Objective] The molecular weight,isoelectric point,signal peptide,domain and other properties of the encoding protein of the known cystatin genes were analyzed.[Method] Cystatin genes were searched in NCBI and the related amino acids sequences were downloaded.SMART software was used to predict the domain.SingalP program was used to search signal peptide.TMHMM program was used to search and predict the transmembrane domain.CLUSTAL W program was used to make multiple sequence alignment.Using MEGA3.1 software,...
文摘Degenerate primers are particularly useful in amplifying homologous genes from different organisms. This paper describes a method for designing degenerate primers for a given multiple alignment of DNA sequences of Hsp70 gene family using ClustalW algorithm. The authors used an in silico approach to find a homology between more than one accession numbers of DNA sequences, X67711.2 was for Oryza sativa Hsp70, AY372071.1 was for Nicotiana tabacum Hsp70 and L41253.2 was for Lycopersicon esculentum Hsc70. The three accession numbers which were retrieved by the BLASTn program depend on their expected value (E-value). Multiple sequence alignment was performed by ClustalW algorithm to produce a conserved blocks and determine the consensus region which had been used to produce the forward and reverse primer by the primer select module of DNAStar Lasergene V7 and In-Silco PCR module of FASTPCR program ver.4.0.8 was performed to detect the melting temperatures (Tm) and predict the PCR product size, The results of designed degenerate primer showed that there was a homology found between the designed primers and the DNA templates for the three accession numbers with at least 80% identity. The result of degenerate PCR showed that the three bands of the amplified PCR products of the three accession numbers were detected at the same molecular weight of marker (400 bp) with a difference about 15 pb compared to the in silco PCR product (385 pb). In conclusion, this study focused on the importance of using the clustalW algorithm for designing the degenerate primer.
文摘Bacterial diversity of 14 sites of the East China Sea was investigated by culture-dependent methods. The impact of human activities on marine bacteria was primarily studied and characteristics of bacteria communities in different areas were analyzed. A total of 396 strains were obtained. These strains belong to 4 phyla, 9 classes and 146 species according to 16S rDNA sequences alignment. For 32 strains, the 16S rDNA sequences similarities between isolated strains and their most closely related species were lower than 98%. The result indicated that there are abundant microbial diversity and a large number of unknown microbial resources in the East China Sea. Isolated strains were dominated byy-proteobacteria (64%), ct-proteobacteria (18%) and Firmicutes (15%). Actinobacteria and Bacteroidetes were less than 3%. Microbial community composition, diversity and abundance among areas with varies distances from land were different. The far the regions from the land, the lower the Shannon index (H') and the Margalef index (DMg) values were.
文摘Although high quality multiple sequence alignment is an essential task in bioinforma- tics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and space phase is the distance matrix computation. This paper addresses this issue by proposing a vectorized parallel method that accomplishes the huge number of similarity comparisons faster in less space. Per- formance tests on real biological datasets using core-iT show superior results in terms of time and space.