In this letter, we briefly describe a program of self adapting hidden Markov model (SA HMM) and its application in multiple sequences alignment. Program consists of two stage optimisation algorithm.
Soybean mosaic virus (SMV), a member of the genus Potyvirus, is a major pathogen of soybean plants in China, and 16 SMV strains have been identified nationwide based on a former detailed SMV classification system. A...Soybean mosaic virus (SMV), a member of the genus Potyvirus, is a major pathogen of soybean plants in China, and 16 SMV strains have been identified nationwide based on a former detailed SMV classification system. As the P3 gene is thought to be involved in viral replication, systemic infection, pathogenicity, and overcoming resistance, knowledge of the P3 gene sequences of SMV and other potyviruses would be useful in efforts to know the genetic relationships among them and control the disease. P3 gene sequences were obtained from representative isolates of the above-mentioned 16 SMV strains and were compared with other SMV strains and 16 Potyvirus species from the National Center for Biotechnology GenBank database. The P3 genes from the 16 SMV isolates are composed of 1041 nucleotides, encoding 347 amino acids, and share 90.7-100% nucleotide (NT) sequence identities and 95.1-100% amino acid (AA) sequence identities. The P3 coding regions of the 16 SMV isolates share high identities (92.4-98.9% NT and 96.0-100% AA) with the reported Korean isolates, followed by the USA isolates (88.5-97.9% NT and 91.4-98.6% AA), and share low identities (80.5-85.2% NT and 82.1-84.7% AA) with the reported HZ 1 and P isolates from Pinellia ternata. The sequence identities of the P3 genes between SMV and the 16 potyviruses varied from 44.4 to 81.9% in the NT sequences and from 21.4 to 85.3% in the AA sequences, respectively. Among them, SMV was closely related to Watermelon mosaic virus (WMV), with 76.0-81.9% NT and 77.5-85.3% AA identities. In addition, the SMV isolates and potyvirus species were clustered into six distinct groups. All the SMV strains isolated from soybean were clustered in Group I, and the remaining species were clustered in other groups. A multiple sequence alignment analysis of the C-terminal regions indicated that the P3 genes within a species were highly conserved, whereas those among species were relatively variable.展开更多
In this paper, we report a multiple sequence alignment result on the basis of 10 amino acid sequences of the M protein, which come from different coronaviruses (4 SARS associated and 6 others known). The alignment mo...In this paper, we report a multiple sequence alignment result on the basis of 10 amino acid sequences of the M protein, which come from different coronaviruses (4 SARS associated and 6 others known). The alignment model was based on the profile HMM (Hidden Markov Model), and the model training was implemented through the SAHMM (Self Adapting Hidden Markov Model) software developed by the authors.展开更多
The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of Multiple...The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.展开更多
[Objective] The molecular weight,isoelectric point,signal peptide,domain and other properties of the encoding protein of the known cystatin genes were analyzed.[Method] Cystatin genes were searched in NCBI and the rel...[Objective] The molecular weight,isoelectric point,signal peptide,domain and other properties of the encoding protein of the known cystatin genes were analyzed.[Method] Cystatin genes were searched in NCBI and the related amino acids sequences were downloaded.SMART software was used to predict the domain.SingalP program was used to search signal peptide.TMHMM program was used to search and predict the transmembrane domain.CLUSTAL W program was used to make multiple sequence alignment.Using MEGA3.1 software,...展开更多
In this paper, direct sequence spread spectrum multiple access (DS/SSMA) communication system employing serially concatenated trellis coded modulation (TCM) and continuous phase modulation (CPM) over flat Rayleigh fa...In this paper, direct sequence spread spectrum multiple access (DS/SSMA) communication system employing serially concatenated trellis coded modulation (TCM) and continuous phase modulation (CPM) over flat Rayleigh fading channel are presented. The performance of this concatenated TCM/CPM DS/SSMA system is exploited by the theoretical analysis and numerical simulations. The results demonstrate that significant improvements in error probability of this DS/SSMA system over the system with single TCM or CPM of different modulation indices can be achieved under the same conditions.展开更多
ObjectiveTo investigate the anticancer property of marine sediment actinomycetes against two different breast cancer cell lines.MethodsIn vitro anticancer activity was carried out against breast (MCF-7 and MDA-MB-231)...ObjectiveTo investigate the anticancer property of marine sediment actinomycetes against two different breast cancer cell lines.MethodsIn vitro anticancer activity was carried out against breast (MCF-7 and MDA-MB-231) cancer cell lines. Partial sequences of the 16s rRNA gene, phylogenetic tree construction, multiple sequence analysis and secondary structure analysis were also carried out with the actinomycetes isolates.ResultsOf the selected five actinomycete isolates, ACT01 and ACT02 showed the IC50 value with (10.13±0.92) and (22.34±5.82) μg/mL concentrations, respectively for MCF-7 cell line at 48 h, but ACT01 showed the minimum (18.54±2.49 μg/mL) level of IC50 value with MDA-MB-231 cell line. Further, the 16s rRNA partial sequences of ACT01, ACT02, ACT03, ACT04 and ACT05 isolates were also deposited in NCBI data bank with the accession numbers of GQ478246, GQ478247, GQ478248, GQ478249 and GQ478250, respectively. The phylogenetic tree analysis showed that, the isolates of ACT02 and ACT03 were represented in group I and III, respectively, but ACT01 and ACT02 were represented in group II. The multiple sequence alignment of the actinomycete isolates showed that, the maximum identical conserved regions were identified with the nucleotide regions of 125 to 221st base pairs, 65 to 119th base pairs and 55, 48 and 31st base pairs. Secondary structure prediction of the 16s rRNA showed that, the maximum free energy was consumed with ACT03 isolate (-45.4 kkal/mol) and the minimum free energy was consumed with ACT04 isolate (?7.6 kkal/mol).ConclusionsThe actinomycete isolates of ACT01 and ACT02 (GQ478246 and GQ478247) which are isolated from sediment sample can be further used as anticancer agents against breast cancer cell lines.展开更多
A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general...A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Statistical coupling analysis (SCA) is a statistical technique that uses evolutionary data of a protein family to measure correlation between distant functional sites and suggests allosteric communication. In proteins, very distant and small interactions between collections of amino acids provide the communication which can be important for signaling process. In this paper, we present the SCA of protein alignment of the esterase family (pfam ID: PF00756) containing the sequence of antigen 85C secreted by Mycobacterium tuberculosis to identify a subset of interacting residues. Clustering analysis of the pairwise correlation highlighted seven important residue positions in the esterase family alignments. These resi-dues were then mapped on the crystal structure of antigen 85C (PDB ID: 1DQZ). The mapping revealed corre-lation between 3 distant residues (Asp38, Leu123 and Met125) and suggests allosteric communication between them. This information can be used for a new drug against this fatal disease.展开更多
Advancements in next-generation sequencer(NGS)platforms have improved NGS sequence data production and reduced the cost involved,which has resulted in the production of a large amount of genome data.The downstream ana...Advancements in next-generation sequencer(NGS)platforms have improved NGS sequence data production and reduced the cost involved,which has resulted in the production of a large amount of genome data.The downstream analysis of multiple associated sequences has become a bottleneck for the growing genomic data due to storage and space utilization issues in the domain of bioinformatics.The traditional string-matching algorithms are efficient for small sized data sequences and cannot process large amounts of data for downstream analysis.This study proposes a novel bit-parallelism algorithm called BitmapAligner to overcome the issues faced due to a large number of sequences and to improve the speed and quality of multiple sequence alignment(MSA).The input files(sequences)tested over BitmapAligner can be easily managed and organized using the Hadoop distributed file system.The proposed aligner converts the test file(the whole genome sequence)into binaries of an equal length of the sequence,line by line,before the sequence alignment processing.The Hadoop distributed file system splits the larger files into blocks,based on a defined block size,which is 128 MB by default.BitmapAligner can accurately process the sequence alignment using the bitmask approach on large-scale sequences after sorting the data.The experimental results indicate that BitmapAligner operates in real time,with a large number of sequences.Moreover,BitmapAligner achieves the exact start and end positions of the pattern sequence to test the MSA application in the whole genome query sequence.The MSA’s accuracy is verified by the bitmask indexing property of the bit-parallelism extended shifts(BXS)algorithm.The dynamic and exact approach of the BXS algorithm is implemented through the MapReduce function of Apache Hadoop.Conversely,the traditional seeds-and-extend approach faces the risk of errors while identifying the pattern sequences’positions.Moreover,the proposed model resolves the largescale data challenges that are covered through MapReduce in the Hadoop framework.Hive,Yarn,HBase,Cassandra,and many other pertinent flavors are to be used in the future for data structuring and annotations on the top layer of Hadoop since Hadoop is primarily used for data organization and handles text documents.展开更多
The main purpose of this paper is to introduce the general Smarandache mul- tiplicative sequence based on the Smarandache multiplicative sequence, and calculate the value of some infinite series involving these sequen...The main purpose of this paper is to introduce the general Smarandache mul- tiplicative sequence based on the Smarandache multiplicative sequence, and calculate the value of some infinite series involving these sequences.展开更多
In Direct Sequence Code Division Multiple Access (DS-CDMA) systems,the chip wave-form affects the implementation,system bandwidth,envelope uniformity,eye pattern and Multiple user Access Interference (MAI). In this pa...In Direct Sequence Code Division Multiple Access (DS-CDMA) systems,the chip wave-form affects the implementation,system bandwidth,envelope uniformity,eye pattern and Multiple user Access Interference (MAI). In this paper,based on an elementary density function of a second order polynomial,a class of second order continuity pulses is proposed. From this class of pulses,we can find some members having faster decaying rate,bigger eye opening,more uniform envelope and stronger anti-MAI capability than the Nyquist waveform. The normalized-bandwidth-pulse-shape-factor product,the decaying rate of the tail of the time waveform,the opening of the eye diagram,and the envelope uniformity of the second order continuity pulses are addressed in the paper that provide the basic information for the selection of the chip pulse for CDMA systems.展开更多
In this paper, the complexity and performance of the Auxiliary Vector (AV) based reduced-rank filtering are addressed. The AV filters presented in the previous papers have the general form of the sum of the signature ...In this paper, the complexity and performance of the Auxiliary Vector (AV) based reduced-rank filtering are addressed. The AV filters presented in the previous papers have the general form of the sum of the signature vector of the desired signal and a set of weighted AVs,which can be classified as three categories according to the orthogonality of their AVs and the optimality of the weight coefficients of the AVs. The AV filter with orthogonal AVs and optimal weight coefficients has the best performance, but requires considerable computational complexity and suffers from the numerical unstable operation. In order to reduce its computational load while keeping the superior performance, several low complexity algorithms are proposed to efficiently calculate the AVs and their weight coefficients. The diagonal loading technique is also introduced to solve the numerical unstability problem without complexity increase. The performance of the three types of AV filters is also compared through their application to Direct Sequence Code Division Multiple Access (DS-CDM A) systems for interference suppression.展开更多
Orange spotted grouper(Epinephelus coioides)is an important mariculture fish,and genomic breeding of this grouper species has been hindered due to lack of efficient genotyping tools.Here,we developed a single nucleoti...Orange spotted grouper(Epinephelus coioides)is an important mariculture fish,and genomic breeding of this grouper species has been hindered due to lack of efficient genotyping tools.Here,we developed a single nucleotide polymorphism(SNP)genotyping technology based on multiplex PCR enrichment capture sequencing,which mainly aims at target area for high-throughput sequencing,and 741 SNPs were designed for genomic selection(GS)of growth and ammonia tolerance traits at the same time.The multiplex PCR enrichment capture sequencing assay showed that the genotyping efficiency was more than 99%in the orange-spotted grouper and the predictive accuracy of body weight and ammonia tolerance traits was 82%and 96%,respectively.More importantly,the average identity of the sequences with these SNPs aligned to the genomes of giant grouper(E.lanceolatus)and brown-marbled grouper(E.fuscoguttatus)were both over 96%.Test data showed that the SNP genotyping efficiency was more than 94%in both giant grouper and brown-marbled grouper.In summary,these results indicated that the development of SNP loci and genotyping approach based on the multiple PCR enrichment capture sequencing are suitable for GS of growth and ammonia tolerance traits in various grouper species,and it would provide technical support for practical grouper breeding.展开更多
Creating a multi-gene alignment matrix for phylogenetic analysis using organelle genomes involves aligning single-gene datasets manually,a process that can be time-consuming and prone to errors.The HomBlocks pipeline ...Creating a multi-gene alignment matrix for phylogenetic analysis using organelle genomes involves aligning single-gene datasets manually,a process that can be time-consuming and prone to errors.The HomBlocks pipeline has been created to eliminate the inaccuracies arising from manual operations.The processing of a large number of sequences,however,remains a time-consuming task.To conquer this challenge,we develop a speedy and efficient method called Organelle Genomes for Phylogenetic Analysis(ORPA).ORPA can quickly generate multiple sequence alignments for whole-genome comparisons by parsing the result files of NCBI BLAST,completing the task just in 1 min.With increasing data volume,the efficiency of ORPA is even more pronounced,over 300 times faster than HomBlocks in aligning 60 high-plant chloroplast genomes.The phylogenetic tree outputs from ORPA are equivalent to HomBlocks,indicating its outstanding efficiency.Due to its speed and accuracy,ORPA can identify species-level evolutionary conflicts,providing valuable insights into evolutionary cognition.展开更多
There are many web-based multiple sequence alignment services accessible around the world. However, many researchers working on biological sequence analysis still struggle with inefficient, unfriendly user interface, ...There are many web-based multiple sequence alignment services accessible around the world. However, many researchers working on biological sequence analysis still struggle with inefficient, unfriendly user interface, and limited capability multiple sequence alignment software. In this study, we provide a comprehensive survey of regional and continental facilities that provide web-based alignment services. We also analyze and identify much needed services that are not available through these existing service providers. We then implement a web-based model to address these needs. From that perspective, our web-based multiple sequence alignment server, SeqAna, provides a unique set of services that none of these studied facilities have. For example, SeqAna provides a multiple sequence alignment scoring and ranking service. This service, the only of its kind, allows SeqAna's users to perform multiple sequence alignment with several alignment tools and rank the results of these alignments in the order of quality. With this service, SeqAna's users will be able to identify which alignment tools are more appropriate for their specific set of sequences. In addition, SeqAna's users can customize a small alignment sample as a reference for SeqAna to automatically identify the best tool to align their large set of sequences.展开更多
Multiple sequence alignment (MSA) is the alignment among more than two molecular biological sequences, which is a fundamental method to analyze evolutionary events such as mutations, insertions, deletions, and re-ar...Multiple sequence alignment (MSA) is the alignment among more than two molecular biological sequences, which is a fundamental method to analyze evolutionary events such as mutations, insertions, deletions, and re-arrangements. In theory, a dynamic programming algorithm can be employed to produce the optimal MSA. However, this leads to an explosive increase in computing time and memory consumption as the number of sequences increases (Taylor, 1990). So far, MSA is still regarded as one of the most challenging problems in bioinformatics and computational biology (Chatzou et al., 2016).展开更多
Although high quality multiple sequence alignment is an essential task in bioinforma- tics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and ...Although high quality multiple sequence alignment is an essential task in bioinforma- tics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and space phase is the distance matrix computation. This paper addresses this issue by proposing a vectorized parallel method that accomplishes the huge number of similarity comparisons faster in less space. Per- formance tests on real biological datasets using core-iT show superior results in terms of time and space.展开更多
文摘In this letter, we briefly describe a program of self adapting hidden Markov model (SA HMM) and its application in multiple sequences alignment. Program consists of two stage optimisation algorithm.
基金supported by the National Natural Science Foundation of China(30671266,31101164)the National Basic Research Program of China(2006CB101708,2009CB118404)+2 种基金the National 863 Program of China(2006AA100104)the 111 Project from Ministry of Education of China(B08025)the Youth Science and Technology Innovation Foundation of Nanjing Agriculture University,China(KJ2010002)
文摘Soybean mosaic virus (SMV), a member of the genus Potyvirus, is a major pathogen of soybean plants in China, and 16 SMV strains have been identified nationwide based on a former detailed SMV classification system. As the P3 gene is thought to be involved in viral replication, systemic infection, pathogenicity, and overcoming resistance, knowledge of the P3 gene sequences of SMV and other potyviruses would be useful in efforts to know the genetic relationships among them and control the disease. P3 gene sequences were obtained from representative isolates of the above-mentioned 16 SMV strains and were compared with other SMV strains and 16 Potyvirus species from the National Center for Biotechnology GenBank database. The P3 genes from the 16 SMV isolates are composed of 1041 nucleotides, encoding 347 amino acids, and share 90.7-100% nucleotide (NT) sequence identities and 95.1-100% amino acid (AA) sequence identities. The P3 coding regions of the 16 SMV isolates share high identities (92.4-98.9% NT and 96.0-100% AA) with the reported Korean isolates, followed by the USA isolates (88.5-97.9% NT and 91.4-98.6% AA), and share low identities (80.5-85.2% NT and 82.1-84.7% AA) with the reported HZ 1 and P isolates from Pinellia ternata. The sequence identities of the P3 genes between SMV and the 16 potyviruses varied from 44.4 to 81.9% in the NT sequences and from 21.4 to 85.3% in the AA sequences, respectively. Among them, SMV was closely related to Watermelon mosaic virus (WMV), with 76.0-81.9% NT and 77.5-85.3% AA identities. In addition, the SMV isolates and potyvirus species were clustered into six distinct groups. All the SMV strains isolated from soybean were clustered in Group I, and the remaining species were clustered in other groups. A multiple sequence alignment analysis of the C-terminal regions indicated that the P3 genes within a species were highly conserved, whereas those among species were relatively variable.
文摘In this paper, we report a multiple sequence alignment result on the basis of 10 amino acid sequences of the M protein, which come from different coronaviruses (4 SARS associated and 6 others known). The alignment model was based on the profile HMM (Hidden Markov Model), and the model training was implemented through the SAHMM (Self Adapting Hidden Markov Model) software developed by the authors.
基金The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research Grant No(DSR2020–01–414).
文摘The alignment operation between many protein sequences or DNAsequences related to the scientific bioinformatics application is very complex.There is a trade-off in the objectives in the existing techniques of MultipleSequence Alignment (MSA). The techniques that concern with speed ignoreaccuracy, whereas techniques that concern with accuracy ignore speed. Theterm alignment means to get the similarity in different sequences with highaccuracy. The more growing number of sequences leads to a very complexand complicated problem. Because of the emergence;rapid development;anddependence on gene sequencing, sequence alignment has become importantin every biological relationship analysis process. Calculating the numberof similar amino acids is the primary method for proving that there is arelationship between two sequences. The time is a main issue in any alignmenttechnique. In this paper, a more effective MSA method for handling themassive multiple protein sequences alignment maintaining the highest accuracy with less time consumption is proposed. The proposed method dependson Artificial Fish Swarm (AFS) algorithm that can break down the mostchallenges of MSA problems. The AFS is exploited to obtain high accuracyin adequate time. ASF has been increasing popularly in various applicationssuch as artificial intelligence, computer vision, machine learning, and dataintensive application. It basically mimics the behavior of fish trying to getthe food in nature. The proposed mechanisms of AFS that is like preying,swarming, following, moving, and leaping help in increasing the accuracy andconcerning the speed by decreasing execution time. The sense organs that aidthe artificial fishes to collect information and vision from the environmenthelp in concerning the accuracy. These features of the proposed AFS make thealignment operation more efficient and are suitable especially for large-scaledata. The implementation and experimental results put the proposed AFS as afirst choice in the queue of alignment compared to the well-known algorithmsin multiple sequence alignment.
文摘[Objective] The molecular weight,isoelectric point,signal peptide,domain and other properties of the encoding protein of the known cystatin genes were analyzed.[Method] Cystatin genes were searched in NCBI and the related amino acids sequences were downloaded.SMART software was used to predict the domain.SingalP program was used to search signal peptide.TMHMM program was used to search and predict the transmembrane domain.CLUSTAL W program was used to make multiple sequence alignment.Using MEGA3.1 software,...
文摘In this paper, direct sequence spread spectrum multiple access (DS/SSMA) communication system employing serially concatenated trellis coded modulation (TCM) and continuous phase modulation (CPM) over flat Rayleigh fading channel are presented. The performance of this concatenated TCM/CPM DS/SSMA system is exploited by the theoretical analysis and numerical simulations. The results demonstrate that significant improvements in error probability of this DS/SSMA system over the system with single TCM or CPM of different modulation indices can be achieved under the same conditions.
基金supported by Indian Council of Medical Research,New Delhi(grant No.59/6/200/BMS/TRM)
文摘ObjectiveTo investigate the anticancer property of marine sediment actinomycetes against two different breast cancer cell lines.MethodsIn vitro anticancer activity was carried out against breast (MCF-7 and MDA-MB-231) cancer cell lines. Partial sequences of the 16s rRNA gene, phylogenetic tree construction, multiple sequence analysis and secondary structure analysis were also carried out with the actinomycetes isolates.ResultsOf the selected five actinomycete isolates, ACT01 and ACT02 showed the IC50 value with (10.13±0.92) and (22.34±5.82) μg/mL concentrations, respectively for MCF-7 cell line at 48 h, but ACT01 showed the minimum (18.54±2.49 μg/mL) level of IC50 value with MDA-MB-231 cell line. Further, the 16s rRNA partial sequences of ACT01, ACT02, ACT03, ACT04 and ACT05 isolates were also deposited in NCBI data bank with the accession numbers of GQ478246, GQ478247, GQ478248, GQ478249 and GQ478250, respectively. The phylogenetic tree analysis showed that, the isolates of ACT02 and ACT03 were represented in group I and III, respectively, but ACT01 and ACT02 were represented in group II. The multiple sequence alignment of the actinomycete isolates showed that, the maximum identical conserved regions were identified with the nucleotide regions of 125 to 221st base pairs, 65 to 119th base pairs and 55, 48 and 31st base pairs. Secondary structure prediction of the 16s rRNA showed that, the maximum free energy was consumed with ACT03 isolate (-45.4 kkal/mol) and the minimum free energy was consumed with ACT04 isolate (?7.6 kkal/mol).ConclusionsThe actinomycete isolates of ACT01 and ACT02 (GQ478246 and GQ478247) which are isolated from sediment sample can be further used as anticancer agents against breast cancer cell lines.
文摘A fundamental goal in cellular signaling is to understand allosteric communication, the process by which sig-nals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Statistical coupling analysis (SCA) is a statistical technique that uses evolutionary data of a protein family to measure correlation between distant functional sites and suggests allosteric communication. In proteins, very distant and small interactions between collections of amino acids provide the communication which can be important for signaling process. In this paper, we present the SCA of protein alignment of the esterase family (pfam ID: PF00756) containing the sequence of antigen 85C secreted by Mycobacterium tuberculosis to identify a subset of interacting residues. Clustering analysis of the pairwise correlation highlighted seven important residue positions in the esterase family alignments. These resi-dues were then mapped on the crystal structure of antigen 85C (PDB ID: 1DQZ). The mapping revealed corre-lation between 3 distant residues (Asp38, Leu123 and Met125) and suggests allosteric communication between them. This information can be used for a new drug against this fatal disease.
基金This work was supported in part by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2018R1C1B5084424)in part by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2019R1A6A1A03032119).
文摘Advancements in next-generation sequencer(NGS)platforms have improved NGS sequence data production and reduced the cost involved,which has resulted in the production of a large amount of genome data.The downstream analysis of multiple associated sequences has become a bottleneck for the growing genomic data due to storage and space utilization issues in the domain of bioinformatics.The traditional string-matching algorithms are efficient for small sized data sequences and cannot process large amounts of data for downstream analysis.This study proposes a novel bit-parallelism algorithm called BitmapAligner to overcome the issues faced due to a large number of sequences and to improve the speed and quality of multiple sequence alignment(MSA).The input files(sequences)tested over BitmapAligner can be easily managed and organized using the Hadoop distributed file system.The proposed aligner converts the test file(the whole genome sequence)into binaries of an equal length of the sequence,line by line,before the sequence alignment processing.The Hadoop distributed file system splits the larger files into blocks,based on a defined block size,which is 128 MB by default.BitmapAligner can accurately process the sequence alignment using the bitmask approach on large-scale sequences after sorting the data.The experimental results indicate that BitmapAligner operates in real time,with a large number of sequences.Moreover,BitmapAligner achieves the exact start and end positions of the pattern sequence to test the MSA application in the whole genome query sequence.The MSA’s accuracy is verified by the bitmask indexing property of the bit-parallelism extended shifts(BXS)algorithm.The dynamic and exact approach of the BXS algorithm is implemented through the MapReduce function of Apache Hadoop.Conversely,the traditional seeds-and-extend approach faces the risk of errors while identifying the pattern sequences’positions.Moreover,the proposed model resolves the largescale data challenges that are covered through MapReduce in the Hadoop framework.Hive,Yarn,HBase,Cassandra,and many other pertinent flavors are to be used in the future for data structuring and annotations on the top layer of Hadoop since Hadoop is primarily used for data organization and handles text documents.
文摘The main purpose of this paper is to introduce the general Smarandache mul- tiplicative sequence based on the Smarandache multiplicative sequence, and calculate the value of some infinite series involving these sequences.
基金Supported by University Natural Science Research Pro-ject of Jiangsu (No.03KJB510088)National Natural Science Foundation of China (No.60572130).
文摘In Direct Sequence Code Division Multiple Access (DS-CDMA) systems,the chip wave-form affects the implementation,system bandwidth,envelope uniformity,eye pattern and Multiple user Access Interference (MAI). In this paper,based on an elementary density function of a second order polynomial,a class of second order continuity pulses is proposed. From this class of pulses,we can find some members having faster decaying rate,bigger eye opening,more uniform envelope and stronger anti-MAI capability than the Nyquist waveform. The normalized-bandwidth-pulse-shape-factor product,the decaying rate of the tail of the time waveform,the opening of the eye diagram,and the envelope uniformity of the second order continuity pulses are addressed in the paper that provide the basic information for the selection of the chip pulse for CDMA systems.
文摘In this paper, the complexity and performance of the Auxiliary Vector (AV) based reduced-rank filtering are addressed. The AV filters presented in the previous papers have the general form of the sum of the signature vector of the desired signal and a set of weighted AVs,which can be classified as three categories according to the orthogonality of their AVs and the optimality of the weight coefficients of the AVs. The AV filter with orthogonal AVs and optimal weight coefficients has the best performance, but requires considerable computational complexity and suffers from the numerical unstable operation. In order to reduce its computational load while keeping the superior performance, several low complexity algorithms are proposed to efficiently calculate the AVs and their weight coefficients. The diagonal loading technique is also introduced to solve the numerical unstability problem without complexity increase. The performance of the three types of AV filters is also compared through their application to Direct Sequence Code Division Multiple Access (DS-CDM A) systems for interference suppression.
基金National Natural Science Foundation of China(No.31872572)Natural Science Foundation for Fundamental Research in Shenzhen(No.JCYJ20190812105801661)Shenzhen Dapeng Special Program for Industrial Development(No.KJYF202101-01).
文摘Orange spotted grouper(Epinephelus coioides)is an important mariculture fish,and genomic breeding of this grouper species has been hindered due to lack of efficient genotyping tools.Here,we developed a single nucleotide polymorphism(SNP)genotyping technology based on multiplex PCR enrichment capture sequencing,which mainly aims at target area for high-throughput sequencing,and 741 SNPs were designed for genomic selection(GS)of growth and ammonia tolerance traits at the same time.The multiplex PCR enrichment capture sequencing assay showed that the genotyping efficiency was more than 99%in the orange-spotted grouper and the predictive accuracy of body weight and ammonia tolerance traits was 82%and 96%,respectively.More importantly,the average identity of the sequences with these SNPs aligned to the genomes of giant grouper(E.lanceolatus)and brown-marbled grouper(E.fuscoguttatus)were both over 96%.Test data showed that the SNP genotyping efficiency was more than 94%in both giant grouper and brown-marbled grouper.In summary,these results indicated that the development of SNP loci and genotyping approach based on the multiple PCR enrichment capture sequencing are suitable for GS of growth and ammonia tolerance traits in various grouper species,and it would provide technical support for practical grouper breeding.
基金supported by the National Key R&D Program of China(2018YFA0903200)Science Technology and Innovation Commission of Shenzhen Municipality of China(ZDSYS 20200811142605017)It was also supported by Innovation Program of Chinese Academy of Agricultural Sciences and the Elite Young Scientists Program of CAAS.
文摘Creating a multi-gene alignment matrix for phylogenetic analysis using organelle genomes involves aligning single-gene datasets manually,a process that can be time-consuming and prone to errors.The HomBlocks pipeline has been created to eliminate the inaccuracies arising from manual operations.The processing of a large number of sequences,however,remains a time-consuming task.To conquer this challenge,we develop a speedy and efficient method called Organelle Genomes for Phylogenetic Analysis(ORPA).ORPA can quickly generate multiple sequence alignments for whole-genome comparisons by parsing the result files of NCBI BLAST,completing the task just in 1 min.With increasing data volume,the efficiency of ORPA is even more pronounced,over 300 times faster than HomBlocks in aligning 60 high-plant chloroplast genomes.The phylogenetic tree outputs from ORPA are equivalent to HomBlocks,indicating its outstanding efficiency.Due to its speed and accuracy,ORPA can identify species-level evolutionary conflicts,providing valuable insights into evolutionary cognition.
文摘There are many web-based multiple sequence alignment services accessible around the world. However, many researchers working on biological sequence analysis still struggle with inefficient, unfriendly user interface, and limited capability multiple sequence alignment software. In this study, we provide a comprehensive survey of regional and continental facilities that provide web-based alignment services. We also analyze and identify much needed services that are not available through these existing service providers. We then implement a web-based model to address these needs. From that perspective, our web-based multiple sequence alignment server, SeqAna, provides a unique set of services that none of these studied facilities have. For example, SeqAna provides a multiple sequence alignment scoring and ranking service. This service, the only of its kind, allows SeqAna's users to perform multiple sequence alignment with several alignment tools and rank the results of these alignments in the order of quality. With this service, SeqAna's users will be able to identify which alignment tools are more appropriate for their specific set of sequences. In addition, SeqAna's users can customize a small alignment sample as a reference for SeqAna to automatically identify the best tool to align their large set of sequences.
基金supported by the National Key R&D Program of China (Nos. 2017YFB0202600, 2016YFC1302500, 2016YFB0200400 and 2017YFB0202104)the National Natural Science Foundation of China (Nos. 61772543, U1435222, 61625202, 61272056 and 61771331)Guangdong Provincial Department of Science and Technology (No. 2016B090918122)
文摘Multiple sequence alignment (MSA) is the alignment among more than two molecular biological sequences, which is a fundamental method to analyze evolutionary events such as mutations, insertions, deletions, and re-arrangements. In theory, a dynamic programming algorithm can be employed to produce the optimal MSA. However, this leads to an explosive increase in computing time and memory consumption as the number of sequences increases (Taylor, 1990). So far, MSA is still regarded as one of the most challenging problems in bioinformatics and computational biology (Chatzou et al., 2016).
文摘Although high quality multiple sequence alignment is an essential task in bioinforma- tics, it becomes a big dilemma nowadays due to the gigantic explosion in the amount of molecular data. The most consuming time and space phase is the distance matrix computation. This paper addresses this issue by proposing a vectorized parallel method that accomplishes the huge number of similarity comparisons faster in less space. Per- formance tests on real biological datasets using core-iT show superior results in terms of time and space.