The study and comparison of sequences of characters from a finite alphabet is relevant to various areas of science, notably molecular biology. The measurement of sequence similarity involves the consideration of the p...The study and comparison of sequences of characters from a finite alphabet is relevant to various areas of science, notably molecular biology. The measurement of sequence similarity involves the consideration of the possible sequence alignments in order to find an optimal one for which the “distance” between sequences is minimum. In biology informatics area, it is a more important and difficult problem due to the long length (100 at least) of sequence, this cause the compute complexity and large memory require. By associating a path in a lattice to each alignment, a geometric insight can be brought into the problem of finding an optimal alignment, this give an obvious encoding of each path. This problem can be solved by applying genetic algorithm, which is more efficient than dynamic programming and hidden Markov model using commomly now.展开更多
K-mer can be used for the description of biological sequences and k-mer distribution is a tool for solving sequences analysis problems in bioinformatics.We can use k-mer vector as a representation method of the k-mer ...K-mer can be used for the description of biological sequences and k-mer distribution is a tool for solving sequences analysis problems in bioinformatics.We can use k-mer vector as a representation method of the k-mer distribution of the biological sequence.Problems,such as similarity calculations or sequence assembly,can be described in the k-mer vector space.It helps us to identify new features of an old sequence-based problem in bioinformatics and develop new algorithms using the concepts and methods from linear space theory.In this study,we defined the k-mer vector space for the generalized biological sequences.The meaning of corresponding vector operations is explained in the biological context.We presented the vector/matrix form of several widely seen sequence-based problems,including read quantification,sequence assembly,and pattern detection problem.Its advantages and disadvantages are discussed.Also,we implement a tool for the sequence assembly problem based on the concepts of k-mer vector methods.It shows the practicability and convenience of this algorithm design strategy.展开更多
The concept of SBCR was put forward to treat sauce wastewater. Further study showed that adding appropriate amount of calcium chloride to SBR can improve the quality of effluent. The removal rate of COD and color was ...The concept of SBCR was put forward to treat sauce wastewater. Further study showed that adding appropriate amount of calcium chloride to SBR can improve the quality of effluent. The removal rate of COD and color was 84% and 80%, 36%, 96% higher than those of traditional SBR respectively. The results of continuous experiments and biophase observing showed that calcium chloride accumulation increased the sludge production slightly while the sludge dewatering characteristic was improved.展开更多
Suboptimal alignments always reveal additional interesting biological features and have been successfully used to informally estimate the significance of an optimal alignment. Besides, traditional dynamic programming ...Suboptimal alignments always reveal additional interesting biological features and have been successfully used to informally estimate the significance of an optimal alignment. Besides, traditional dynamic programming algorithms for sequence comparison require quadratic space, and hence are infeasible for long protein or DNA sequences. In this paper, a space-efficient sampling algorithm for computing suboptimal alignments is described. The algorithm uses a general gap model, where the cost associated with gaps is given by an affine score, and randomly selects an alignment according to the distribution of weights of all potential alignments. If x and y are two sequences with lengths n and m, respectively, then the space requirement of this algorithm is linear to the sum of n and m. Finally, an example illustrates the utility of the algorithm.展开更多
The kernel method,especially the kernel-fusion method,is widely used in social networks,computer vision,bioinformatics,and other applications.It deals effectively with nonlinear classification problems,which can map l...The kernel method,especially the kernel-fusion method,is widely used in social networks,computer vision,bioinformatics,and other applications.It deals effectively with nonlinear classification problems,which can map linearly inseparable biological sequence data from low to high-dimensional space for more accurate differentiation,enabling the use of kernel methods to predict the structure and function of sequences.Therefore,the kernel method is significant in the solution of bioinformatics problems.Various kernels applied in bioinformatics are explained clearly,which can help readers to select proper kernels to distinguish tasks.Mass biological sequence data occur in practical applications.Research of the use of machine learning methods to obtain knowledge,and how to explore the structure and function of biological methods for theoretical prediction,have always been emphasized in bioinformatics.The kernel method has gradually become an important learning algorithm that is widely used in gene expression and biological sequence prediction.This review focuses on the requirements of classification tasks of biological sequence data.It studies kernel methods and optimization algorithms,including methods of constructing kernel matrices based on the characteristics of biological sequences and kernel fusion methods existing in a multiple kernel learning framework.展开更多
The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics.Numerous solving algorithms are used for these problems,and complex similarities and differences exist a...The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics.Numerous solving algorithms are used for these problems,and complex similarities and differences exist among these algorithms for the same problem,causing difficulty for researchers to select the appropriate one.To address this situation,combined with the formal partition-and-recur method,component technology,domain engineering,and generic programming,the paper presents a method for the development of a family of biological sequence analysis algorithms.It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms.The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability,understandability,and development efficiency of particular algorithms.展开更多
Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic ...Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.展开更多
Sulfide dioxide(SO2) is often released during the combustion processes of fossil fuels. An integrated bioreactor with two sections, namely, a suspended zone(SZ) and immobilized zone(IZ), was applied to treat SO2...Sulfide dioxide(SO2) is often released during the combustion processes of fossil fuels. An integrated bioreactor with two sections, namely, a suspended zone(SZ) and immobilized zone(IZ), was applied to treat SO2 for 6 months. Sampling ports were set in both sections to investigate the performance and microbial characteristics of the integrated bioreactor. SO2 was effectively removed by the synergistic effect of the SZ and IZ, and more than 85%removal efficiency was achieved at steady state. The average elimination capacity of SO2 in the bioreactor was 2.80 g/(m3·hr) for the SZ and 1.50 g/(m3· hr) for the IZ. Most SO2 was eliminated in the SZ. The liquid level of the SZ and the water content ratio of the packing material in the IZ affected SO2 removal efficiency. The SZ served a key function not only in SO2 elimination, but also in moisture maintenance for the IZ. The desired water content in IZ could be feasibly maintained without any additional pre-humidification facilities. Clone libraries of 16 S r DNA directly amplified from the DNA of each sample were constructed and sequenced to analyze the community composition and diversity in the individual zones.The desulfurization bacteria dominated both zones. Paenibacillus sp. was present in both zones, whereas Ralstonia sp. existed only in the SZ. The transfer of SO2 to the SZ involved dissolution in the nutrient solution and biodegradation by the sulfur-oxidizing bacteria.This work presents a potential biological treatment method for waste gases containing hydrophilic compounds.展开更多
文摘The study and comparison of sequences of characters from a finite alphabet is relevant to various areas of science, notably molecular biology. The measurement of sequence similarity involves the consideration of the possible sequence alignments in order to find an optimal one for which the “distance” between sequences is minimum. In biology informatics area, it is a more important and difficult problem due to the long length (100 at least) of sequence, this cause the compute complexity and large memory require. By associating a path in a lattice to each alignment, a geometric insight can be brought into the problem of finding an optimal alignment, this give an obvious encoding of each path. This problem can be solved by applying genetic algorithm, which is more efficient than dynamic programming and hidden Markov model using commomly now.
基金the National Natural Science Foundation of China(11771393,11632015)the Natural Sci-ence Foundation of Zhejiang Province,China(LZ14A010002).
文摘K-mer can be used for the description of biological sequences and k-mer distribution is a tool for solving sequences analysis problems in bioinformatics.We can use k-mer vector as a representation method of the k-mer distribution of the biological sequence.Problems,such as similarity calculations or sequence assembly,can be described in the k-mer vector space.It helps us to identify new features of an old sequence-based problem in bioinformatics and develop new algorithms using the concepts and methods from linear space theory.In this study,we defined the k-mer vector space for the generalized biological sequences.The meaning of corresponding vector operations is explained in the biological context.We presented the vector/matrix form of several widely seen sequence-based problems,including read quantification,sequence assembly,and pattern detection problem.Its advantages and disadvantages are discussed.Also,we implement a tool for the sequence assembly problem based on the concepts of k-mer vector methods.It shows the practicability and convenience of this algorithm design strategy.
文摘The concept of SBCR was put forward to treat sauce wastewater. Further study showed that adding appropriate amount of calcium chloride to SBR can improve the quality of effluent. The removal rate of COD and color was 84% and 80%, 36%, 96% higher than those of traditional SBR respectively. The results of continuous experiments and biophase observing showed that calcium chloride accumulation increased the sludge production slightly while the sludge dewatering characteristic was improved.
基金supported by the National Natural Science Foundation of China (Grant No.10771133)
文摘Suboptimal alignments always reveal additional interesting biological features and have been successfully used to informally estimate the significance of an optimal alignment. Besides, traditional dynamic programming algorithms for sequence comparison require quadratic space, and hence are infeasible for long protein or DNA sequences. In this paper, a space-efficient sampling algorithm for computing suboptimal alignments is described. The algorithm uses a general gap model, where the cost associated with gaps is given by an affine score, and randomly selects an alignment according to the distribution of weights of all potential alignments. If x and y are two sequences with lengths n and m, respectively, then the space requirement of this algorithm is linear to the sum of n and m. Finally, an example illustrates the utility of the algorithm.
基金supported by the National Natural Science Foundation of China (Grant Nos.61922020,61771331,61902259).
文摘The kernel method,especially the kernel-fusion method,is widely used in social networks,computer vision,bioinformatics,and other applications.It deals effectively with nonlinear classification problems,which can map linearly inseparable biological sequence data from low to high-dimensional space for more accurate differentiation,enabling the use of kernel methods to predict the structure and function of sequences.Therefore,the kernel method is significant in the solution of bioinformatics problems.Various kernels applied in bioinformatics are explained clearly,which can help readers to select proper kernels to distinguish tasks.Mass biological sequence data occur in practical applications.Research of the use of machine learning methods to obtain knowledge,and how to explore the structure and function of biological methods for theoretical prediction,have always been emphasized in bioinformatics.The kernel method has gradually become an important learning algorithm that is widely used in gene expression and biological sequence prediction.This review focuses on the requirements of classification tasks of biological sequence data.It studies kernel methods and optimization algorithms,including methods of constructing kernel matrices based on the characteristics of biological sequences and kernel fusion methods existing in a multiple kernel learning framework.
基金supported by the National Natural Science Foundation of China(No.62062039)Natural Science Foundation of Jiangxi Province(Nos.20202BAB202024 and 20212BAB202017).
文摘The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics.Numerous solving algorithms are used for these problems,and complex similarities and differences exist among these algorithms for the same problem,causing difficulty for researchers to select the appropriate one.To address this situation,combined with the formal partition-and-recur method,component technology,domain engineering,and generic programming,the paper presents a method for the development of a family of biological sequence analysis algorithms.It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms.The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability,understandability,and development efficiency of particular algorithms.
文摘Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.
基金financially supported by the National Natural Science Foundation of China (No. 51221892)the Major Science and Technology Program for Water Pollution Control and Treatment (No. 2010ZX07319-001-03)
文摘Sulfide dioxide(SO2) is often released during the combustion processes of fossil fuels. An integrated bioreactor with two sections, namely, a suspended zone(SZ) and immobilized zone(IZ), was applied to treat SO2 for 6 months. Sampling ports were set in both sections to investigate the performance and microbial characteristics of the integrated bioreactor. SO2 was effectively removed by the synergistic effect of the SZ and IZ, and more than 85%removal efficiency was achieved at steady state. The average elimination capacity of SO2 in the bioreactor was 2.80 g/(m3·hr) for the SZ and 1.50 g/(m3· hr) for the IZ. Most SO2 was eliminated in the SZ. The liquid level of the SZ and the water content ratio of the packing material in the IZ affected SO2 removal efficiency. The SZ served a key function not only in SO2 elimination, but also in moisture maintenance for the IZ. The desired water content in IZ could be feasibly maintained without any additional pre-humidification facilities. Clone libraries of 16 S r DNA directly amplified from the DNA of each sample were constructed and sequenced to analyze the community composition and diversity in the individual zones.The desulfurization bacteria dominated both zones. Paenibacillus sp. was present in both zones, whereas Ralstonia sp. existed only in the SZ. The transfer of SO2 to the SZ involved dissolution in the nutrient solution and biodegradation by the sulfur-oxidizing bacteria.This work presents a potential biological treatment method for waste gases containing hydrophilic compounds.