The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure p...The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated ...The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.展开更多
A three-dimensional turbulent flow through an entire centrifugal pump is simulated using k-ε turbulence model modified by rotation and curvature, SIMPLEC method and body-fitted coordinate. The velocity and pressure f...A three-dimensional turbulent flow through an entire centrifugal pump is simulated using k-ε turbulence model modified by rotation and curvature, SIMPLEC method and body-fitted coordinate. The velocity and pressure fields are obtained for the pump under various working conditions, which is used to predict the head and hydraulic efficiency of the pump, and the results correspond well with the measured values. The calculation results indicate that the pressure is higher on the pressure side than that on the suction side of the blade; The relative velocity on the suction side gradually decreases from the impeller inlet to the outlet, while increases on the pressure side, it finally results in the lower relative velocity on the suction side and the higher one on the pressure side at the impeller outlet; The impeller flow field is asymmetric, i.e. the velocity and pressure fields arc totally different among all channels in the impeller; In the volute, the static pressure gradually increases with the flow route, and a large pressure gratitude occurs in the tongue; Secondary flow exists in the rear part of the spiral.展开更多
Objective To study the specific amino acid variation in Nef that may be related to disease progression after infection with HIV-1 subtype B, a predominant strain circulating in China, and to determine whether changes ...Objective To study the specific amino acid variation in Nef that may be related to disease progression after infection with HIV-1 subtype B, a predominant strain circulating in China, and to determine whether changes in Nef secondary structure may influence different stages of AIDS development based on the concept that the Nef gene of HIV infection dramatically alter the severity of viral infection and virus replication and disease progression, and that long-term non-progressors (LTNP) of HIV infection are commonly associated with either a deletion of the Nef gene or the defective Nef alleles. Methods The study subjects were divided into LTNPI(n=14), LTNP2 (n=16) and slow progressor (SP, n=19) groups for mutational analysis of the Nef sequence. The data were obtained by using Bioedit, MEGA, Anthewin and SAS software. Results Residues in Nef TA48/49 and K151 occurred more frequently in the LTNP group while AA48/49 was more frequently observed in the SP group. Of the differences observed in the secondary structure comparison using Nef consensus sequences of these three groups, one was roughly corresponding to the Nef48/49 mutation site. Conclusion TA48/49, Kiss, and AA48/49 in the Nef gene might be associated with the different stages of HIV infection, and there may be a link between the Nef secondary structure and the progression of HIV- 1 infection.展开更多
The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segm...The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segment of an antigen, has a folding motif of α-helix, whereas P2, which is derived by deleting four residues AWNR from peptide P1, prevents the formation of helix and presents a β-strand. And peptlde P1 experiences a more rugged energy landscape than peptide P2. From our results, it is inferred that the antibody CD8 cytolytic T lymphocyte prefers an antigen with a β-folding structure to that with an α-helical one.展开更多
Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profilin...Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transeriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. Results: We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. Conclusions: To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.展开更多
It has been shown that the progress in the determination of membrane protein structure grows exponentially, with approximately the same growth rate as that of the water-soluble proteins. In order to investigate the ef...It has been shown that the progress in the determination of membrane protein structure grows exponentially, with approximately the same growth rate as that of the water-soluble proteins. In order to investigate the effect of this, on the performance of prediction algorithms for both α-helical and β-barrel membrane proteins, we conducted a prospective study based on historical records. We trained separate hidden Markov models with different sized training sets and evaluated their performance on topology prediction for the two classes of transmembrane proteins. We show that the existing top-scoring algorithms for predicting the transmembrane segments of α-helical membrane proteins perform slightly better than that of β-barrel outer membrane proteins in all measures of accuracy. With the same rationale, a metaoanalysis of the performance of the secondary structure prediction algorithms indicates that existing algorithmic techniques cannot be further improved by just adding more non-homologous sequences to the training sets. The upper limit for secondary structure prediction is estimated to be no more than 70% and 80% of correctly predicted residues for single sequence based methods and multiple sequence based ones, respectively. Therefore, we should concentrate our efforts on utilizing new techniques for the development of even better scoring predictors.展开更多
Copper and iron play important roles in a variety of biological processes, especially when being chelated with proteins. The proteins involved in the metal binding, transporting and metabolism have aroused much intere...Copper and iron play important roles in a variety of biological processes, especially when being chelated with proteins. The proteins involved in the metal binding, transporting and metabolism have aroused much interest. To facilitate the study on this topic, we constructed two databases (DCCP and DICP) containing the known copper- and iron-chelating proteins~ which are freely available from the website http://sdbi.sdut.edu.cn/en. Users can conveniently search and browse all of the entries in the databases. Based on the two databases, bioinformatic analyses were performed, which provided some novel insights into metalloproteins.展开更多
Purpose–The purpose of this paper is to present a study of the effect of different types of annealing schedules for a ribonucleic acid(RNA)secondary structure prediction algorithm based on simulated annealing(SA).Des...Purpose–The purpose of this paper is to present a study of the effect of different types of annealing schedules for a ribonucleic acid(RNA)secondary structure prediction algorithm based on simulated annealing(SA).Design/methodology/approach–An RNA folding algorithm was implemented that assembles the final structure from potential substructures(helixes).Structures are encoded as a permutation of helixes.An SA searches this space of permutations.Parameters and annealing schedules were studied and fine-tuned to optimize algorithm performance.Findings–In comparing with mfold,the SA algorithm shows comparable results(in terms of F-measure)even with a less sophisticated thermodynamic model.In terms of average specificity,the SA algorithm has provided surpassing results.Research limitations/implications–Most of the underlying thermodynamic models are too simplistic and incomplete to accurately model the free energy for larger structures.This is the largest limitation of free energy-based RNA folding algorithms in general.Practical implications–The algorithm offers a different approach that can be used in practice to fold RNA sequences quickly.Originality/value–The algorithm is one of only two SA-based RNA folding algorithms.The authors use a very different encoding,based on permutation of candidate helixes.The in depth study of annealing schedules and other parameters makes the algorithm a strong contender.Another benefit is that new thermodynamic models can be incorporated with relative ease(which is not the case for algorithms based on dynamic programming).展开更多
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
文摘The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
基金Supported by the National Natrual Science Foundation of China (No.60373044) and Knowl-edge Innovative Project of CAS (No.KSCX2-SW-233).
文摘The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.
基金This project is supported by Provincial Natural Science Foundation of Jiangsu, China(No.BK2004406)Provincial Innovation Foundation for Graduate Students of Jiangsu, China(No.1223000053
文摘A three-dimensional turbulent flow through an entire centrifugal pump is simulated using k-ε turbulence model modified by rotation and curvature, SIMPLEC method and body-fitted coordinate. The velocity and pressure fields are obtained for the pump under various working conditions, which is used to predict the head and hydraulic efficiency of the pump, and the results correspond well with the measured values. The calculation results indicate that the pressure is higher on the pressure side than that on the suction side of the blade; The relative velocity on the suction side gradually decreases from the impeller inlet to the outlet, while increases on the pressure side, it finally results in the lower relative velocity on the suction side and the higher one on the pressure side at the impeller outlet; The impeller flow field is asymmetric, i.e. the velocity and pressure fields arc totally different among all channels in the impeller; In the volute, the static pressure gradually increases with the flow route, and a large pressure gratitude occurs in the tongue; Secondary flow exists in the rear part of the spiral.
基金supported by an NIH CIPRA grant (U19A151915-03)It was also supported by China 863 National High Technology Research and Development Project (2006AA02Z418)China 973 National key Project (2005 CB522903).
文摘Objective To study the specific amino acid variation in Nef that may be related to disease progression after infection with HIV-1 subtype B, a predominant strain circulating in China, and to determine whether changes in Nef secondary structure may influence different stages of AIDS development based on the concept that the Nef gene of HIV infection dramatically alter the severity of viral infection and virus replication and disease progression, and that long-term non-progressors (LTNP) of HIV infection are commonly associated with either a deletion of the Nef gene or the defective Nef alleles. Methods The study subjects were divided into LTNPI(n=14), LTNP2 (n=16) and slow progressor (SP, n=19) groups for mutational analysis of the Nef sequence. The data were obtained by using Bioedit, MEGA, Anthewin and SAS software. Results Residues in Nef TA48/49 and K151 occurred more frequently in the LTNP group while AA48/49 was more frequently observed in the SP group. Of the differences observed in the secondary structure comparison using Nef consensus sequences of these three groups, one was roughly corresponding to the Nef48/49 mutation site. Conclusion TA48/49, Kiss, and AA48/49 in the Nef gene might be associated with the different stages of HIV infection, and there may be a link between the Nef secondary structure and the progression of HIV- 1 infection.
基金Project supported by the National Natural Science Foundation of China (Grant Nos 90103031, 10474041, 90403120 and 10021001), and the Nonlinear Project (973) of the NSM.
文摘The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segment of an antigen, has a folding motif of α-helix, whereas P2, which is derived by deleting four residues AWNR from peptide P1, prevents the formation of helix and presents a β-strand. And peptlde P1 experiences a more rugged energy landscape than peptide P2. From our results, it is inferred that the antibody CD8 cytolytic T lymphocyte prefers an antigen with a β-folding structure to that with an α-helical one.
文摘Background: Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transeriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. Results: We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. Conclusions: To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
基金PGB was supported by a scholarship from the State Scholarships Foundation of Greece (SSF) for postdoctoral research in the Department of Cell Biology and Biophysics of the University of Athens (Machine Learning Algorithms for Bioinformatics)
文摘It has been shown that the progress in the determination of membrane protein structure grows exponentially, with approximately the same growth rate as that of the water-soluble proteins. In order to investigate the effect of this, on the performance of prediction algorithms for both α-helical and β-barrel membrane proteins, we conducted a prospective study based on historical records. We trained separate hidden Markov models with different sized training sets and evaluated their performance on topology prediction for the two classes of transmembrane proteins. We show that the existing top-scoring algorithms for predicting the transmembrane segments of α-helical membrane proteins perform slightly better than that of β-barrel outer membrane proteins in all measures of accuracy. With the same rationale, a metaoanalysis of the performance of the secondary structure prediction algorithms indicates that existing algorithmic techniques cannot be further improved by just adding more non-homologous sequences to the training sets. The upper limit for secondary structure prediction is estimated to be no more than 70% and 80% of correctly predicted residues for single sequence based methods and multiple sequence based ones, respectively. Therefore, we should concentrate our efforts on utilizing new techniques for the development of even better scoring predictors.
基金This work was supported by the National Basic Research Program of China(2003CB114400)the National Natural Science Foundation of China(Grant No.30100035).
文摘Copper and iron play important roles in a variety of biological processes, especially when being chelated with proteins. The proteins involved in the metal binding, transporting and metabolism have aroused much interest. To facilitate the study on this topic, we constructed two databases (DCCP and DICP) containing the known copper- and iron-chelating proteins~ which are freely available from the website http://sdbi.sdut.edu.cn/en. Users can conveniently search and browse all of the entries in the databases. Based on the two databases, bioinformatic analyses were performed, which provided some novel insights into metalloproteins.
基金the NSERC for this research under Research Grant Number RG-PIN 238298Both authors would like to acknowledge the support of the InfoNet Media Centre funded by the Canadian Foundation for Innovation(CFI)under grant number CFI-3648.
文摘Purpose–The purpose of this paper is to present a study of the effect of different types of annealing schedules for a ribonucleic acid(RNA)secondary structure prediction algorithm based on simulated annealing(SA).Design/methodology/approach–An RNA folding algorithm was implemented that assembles the final structure from potential substructures(helixes).Structures are encoded as a permutation of helixes.An SA searches this space of permutations.Parameters and annealing schedules were studied and fine-tuned to optimize algorithm performance.Findings–In comparing with mfold,the SA algorithm shows comparable results(in terms of F-measure)even with a less sophisticated thermodynamic model.In terms of average specificity,the SA algorithm has provided surpassing results.Research limitations/implications–Most of the underlying thermodynamic models are too simplistic and incomplete to accurately model the free energy for larger structures.This is the largest limitation of free energy-based RNA folding algorithms in general.Practical implications–The algorithm offers a different approach that can be used in practice to fold RNA sequences quickly.Originality/value–The algorithm is one of only two SA-based RNA folding algorithms.The authors use a very different encoding,based on permutation of candidate helixes.The in depth study of annealing schedules and other parameters makes the algorithm a strong contender.Another benefit is that new thermodynamic models can be incorporated with relative ease(which is not the case for algorithms based on dynamic programming).