[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm su...[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.展开更多
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some st...Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using ...In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.展开更多
The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic...The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physic...A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.展开更多
The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated ...The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the &...The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.展开更多
Structure prediction methods have been widely used as a state-of-the-art tool for structure searches and materials discovery, leading to many theory-driven breakthroughs on discoveries of new materials. These methods ...Structure prediction methods have been widely used as a state-of-the-art tool for structure searches and materials discovery, leading to many theory-driven breakthroughs on discoveries of new materials. These methods generally involve the exploration of the potential energy surfaces of materials through various structure sampling techniques and optimization algorithms in conjunction with quantum mechanical calculations. By taking advantage of the general feature of materials potential energy surface and swarm-intelligence-based global optimization algorithms, we have developed the CALYPSO method for structure prediction, which has been widely used in fields as diverse as computational physics, chemistry, and materials science. In this review, we provide the basic theory of the CALYPSO method, placing particular emphasis on the principles of its various structure dealing methods. We also survey the current challenges faced by structure prediction methods and include an outlook on the future developments of CALYPSO in the conclusions.展开更多
In recent years,the in silico epitopes prediction tools have facilitated the progress of vaccines development significantly and many have been applied to predict epitopes in viruses successfully. Herein,a general over...In recent years,the in silico epitopes prediction tools have facilitated the progress of vaccines development significantly and many have been applied to predict epitopes in viruses successfully. Herein,a general overview of different tools currently available,including T cell and B cell epitopes prediction tools,is presented. And the principles of different prediction algorithms are reviewed briefly. Finally,several examples are present to illustrate the application of the prediction tools.展开更多
Cluster science as a bridge linking atomic molecular physics and condensed matter inspired the nanomaterials development in the past decades, ranging from the single-atom catalysis to ligand-protected noble metal clus...Cluster science as a bridge linking atomic molecular physics and condensed matter inspired the nanomaterials development in the past decades, ranging from the single-atom catalysis to ligand-protected noble metal clusters. The corresponding studies not only have been restricted to the search for the geometrical structures of clusters, but also have promoted the development of cluster-assembled materials as the building blocks. The CALYPSO cluster prediction method combined with other computational techniques have significantly stimulated the development of the cluster-based nanomaterials. In this review, we will summarize some good cases of cluster structure by CALYPSO method, which have also been successfully identified by the photoelectron spectra experiments. Beginning with the alkali-metal clusters, which serve as benchmarks, a series of studies are performed on the size-dependent elemental clusters which possess relatively high stability and interesting chemical physical properties. Special attentions are paid to the boron-based clusters because of their promising applications. The NbSi12 and BeB16 clusters, for example, are two classic representatives of the silicon-and boron-based clusters, which can be viewed as building blocks of nanotubes and borophene. This review offers a detailed description of the structural evolutions and electronic properties of medium-sized pure and doped clusters, which will advance fundamental knowledge of cluster-based nanomaterials and provide valuable information for further theoretical and experimental studies.展开更多
The advantages and disadvantages of genetic algorithm and BP algorithm are introduced. A neural network based on GA-BP algorithm is proposed and applied in the prediction of protein secondary structure, which combines...The advantages and disadvantages of genetic algorithm and BP algorithm are introduced. A neural network based on GA-BP algorithm is proposed and applied in the prediction of protein secondary structure, which combines the advantages of BP and GA. The prediction and training on the neural network are made respectively based on 4 structure classifications of protein so as to get higher rate of predication---the highest prediction rate 75.65%,the average prediction rate 65.04%.展开更多
Phytase is a kind of enzyme that hydrolyzes phytic acid and its salts to produce inositol and phosphoric acid. As a new feed additive, phytase has great potential in animal nutrition and environmental protection. Beca...Phytase is a kind of enzyme that hydrolyzes phytic acid and its salts to produce inositol and phosphoric acid. As a new feed additive, phytase has great potential in animal nutrition and environmental protection. Because of its good stability, large-scale production and high activity, microbial phytase has become a hot spot in industrial application. Here, we reported the predicted structure and enzymatic properties of a phytase from Bacillus subtilis, which was named as phyS. It was clear that the optimal temperature is 35°C, and the optimal pH is 8. Meanwhile, the enzyme activity was kept at above 90% in the range of pH 8 - 9, this result demonstrated that phyS is an alkaline phytase. This study lays a foundation for the extensive application of phyS.展开更多
BACKGROUND Circular RNAs(circRNAs)are involved in the pathogenesis of many diseases through competing endogenous RNA(ceRNA)regulatory mechanisms.AIM To investigate a circRNA-related ceRNA regulatory network and a new ...BACKGROUND Circular RNAs(circRNAs)are involved in the pathogenesis of many diseases through competing endogenous RNA(ceRNA)regulatory mechanisms.AIM To investigate a circRNA-related ceRNA regulatory network and a new predictive model by circRNA to understand the diagnostic mechanism of circRNAs in ulcerative colitis(UC).METHODS We obtained gene expression profiles of circRNAs,miRNAs,and mRNAs in UC from the Gene Expression Omnibus dataset.The circRNA-miRNA-mRNA network was constructed based on circRNA-miRNA and miRNA-mRNA interactions.Functional enrichment analysis was performed to identify the biological mechanisms involved in circRNAs.We identified the most relevant differential circRNAs for diagnosing UC and constructed a new predictive nomogram,whose efficacy was tested with the C-index,receiver operating characteristic curve(ROC),and decision curve analysis(DCA).RESULTS A circRNA-miRNA-mRNA regulatory network was obtained,containing 12 circRNAs,three miRNAs,and 38 mRNAs.Two optimal prognostic-related differentially expressed circRNAs,hsa_circ_0085323 and hsa_circ_0036906,were included to construct a predictive nomogram.The model showed good discrimination,with a C-index of 1(>0.9,high accuracy).ROC and DCA suggested that the nomogram had a beneficial diagnostic ability.CONCLUSION This novel predictive nomogram incorporating hsa_circ_0085323 and hsa_circ_0036906 can be conveniently used to predict the risk of UC.The circRNa-miRNA-mRNA network in UC could be more clinically significant.展开更多
Objective:To identify full length cDNA sequence of lactate dehydrogenase(LDH) from adult Echinococcus granulosus(E.granulosus) and to predict the structure and function of its encoding protein using bioinformatics met...Objective:To identify full length cDNA sequence of lactate dehydrogenase(LDH) from adult Echinococcus granulosus(E.granulosus) and to predict the structure and function of its encoding protein using bioinformatics methods.Methods:With the help of NCBI,EMBI, Expasy and other online sites,the open reading frame(ORF),conserved domain,physical and chemical parameters,signal peptide,epitope,topological structures of the protein sequences were predicted and a homology tertiary structure model was created:Vector NT1 software was used for sequence alignment,phylogenetic tree construction and tertiary structure prediction. Results:The target sequence was 1 233 bp length with a 996 bp biggest ORF encoding 331 amino acids protein with typical L-LDH conserved domain.It was confirmed as full length cDNA of LDH from E.granulosus and named as EgLDH(GenBank accession number:HM748917).The predicted molecular weight and isoelectric point of the deduced protein were 3 5516.2Da and 6.32 respectively.Compared with LDHs from Taenia solium,Taenia saginata asiatica,Spirometra erinaceieuropaei.Schistosoma japonicum,Clonorchis sinensis and human,it showed similarity of 86% ,85% ,55% ,58% ,58% and 53% ,respectively.EgLDH contained 3 putative transmembrane regions and 4 major epitopes(54aa-59aa.81aa-87aa,97aa-102aa,307aa-313aa),the latter were significant different from the corresponding regions of human LDH.In addition,some NAD and substrate binding sites located on epitopes 54aa-59aa and 97aa-102aa,respectively.Tertiary structure prediction showed that 3 key catalytic residues 105R,165D and 192H forming a catalytic center near the epitope 97aa-102aa,most NAD and substrate binding sites located around the center.Conclusions:The full length cDNA sequences of EgLDH were identified.It encoded a putative transmembrane protein which might be an ideal target molecule for vaccine and drugs.展开更多
Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA ...Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes.Thus,a plethora of approaches have been developed to predict,identify,or solve RNA structures based on various computational,molecular,genetic,chemical,or physicochemical strategies.Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation,time,speed,cost,and throughput,but they strongly underperform in terms of accuracy that significantly limits their broader application.Nonetheless,the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs.Here,we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity.We found that while many programs performed well in relatively simple tasks,their performance varied significantly in more complex RNA folding problems.However,in general,a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures,at least based on the specific class of sequences tested,suggesting that it may represent the future of RNA structure prediction algorithms.展开更多
基金Supported by the Science Foundation of Hengyang Normal University of China(09A36)~~
文摘[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
基金supported by the National Natural Science Foundation of China(Grant Nos.11074191,11175132,and 11374234)the National Basic Research Programof China(Grant No.2011CB933600)the Program for New Century Excellent Talents of China(Grant No.NCET 08-0408)
文摘Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金Supported by the National Natural Science Foundation of China(60133010,70071042,60073043)
文摘In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.
基金This work was supported by the National Science Foundation(Grant DMR-9812351)
文摘The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
基金The National Natural Science Founda-tion of China (No.10471051) and the National Basic Research Program (973) of China (No.2004CB318000)
文摘A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.
基金Supported by the National Natrual Science Foundation of China (No.60373044) and Knowl-edge Innovative Project of CAS (No.KSCX2-SW-233).
文摘The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
文摘The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11534003 and 11604117)the National Key Research and Development Program of China(Grant No.2016YFB0201201)+1 种基金the Program for JLU Science and Technology Innovative Research Team(JLUSTIRT)of Chinathe Science Challenge Project of China(Grant No.TZ2016001)
文摘Structure prediction methods have been widely used as a state-of-the-art tool for structure searches and materials discovery, leading to many theory-driven breakthroughs on discoveries of new materials. These methods generally involve the exploration of the potential energy surfaces of materials through various structure sampling techniques and optimization algorithms in conjunction with quantum mechanical calculations. By taking advantage of the general feature of materials potential energy surface and swarm-intelligence-based global optimization algorithms, we have developed the CALYPSO method for structure prediction, which has been widely used in fields as diverse as computational physics, chemistry, and materials science. In this review, we provide the basic theory of the CALYPSO method, placing particular emphasis on the principles of its various structure dealing methods. We also survey the current challenges faced by structure prediction methods and include an outlook on the future developments of CALYPSO in the conclusions.
基金The National Natural Science Foundations of China (30870131)the National Key Projects in the Infectious Fields (2008ZX10002-011, 2008ZX10004-004)
文摘In recent years,the in silico epitopes prediction tools have facilitated the progress of vaccines development significantly and many have been applied to predict epitopes in viruses successfully. Herein,a general overview of different tools currently available,including T cell and B cell epitopes prediction tools,is presented. And the principles of different prediction algorithms are reviewed briefly. Finally,several examples are present to illustrate the application of the prediction tools.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.U1804121 and 11304167)
文摘Cluster science as a bridge linking atomic molecular physics and condensed matter inspired the nanomaterials development in the past decades, ranging from the single-atom catalysis to ligand-protected noble metal clusters. The corresponding studies not only have been restricted to the search for the geometrical structures of clusters, but also have promoted the development of cluster-assembled materials as the building blocks. The CALYPSO cluster prediction method combined with other computational techniques have significantly stimulated the development of the cluster-based nanomaterials. In this review, we will summarize some good cases of cluster structure by CALYPSO method, which have also been successfully identified by the photoelectron spectra experiments. Beginning with the alkali-metal clusters, which serve as benchmarks, a series of studies are performed on the size-dependent elemental clusters which possess relatively high stability and interesting chemical physical properties. Special attentions are paid to the boron-based clusters because of their promising applications. The NbSi12 and BeB16 clusters, for example, are two classic representatives of the silicon-and boron-based clusters, which can be viewed as building blocks of nanotubes and borophene. This review offers a detailed description of the structural evolutions and electronic properties of medium-sized pure and doped clusters, which will advance fundamental knowledge of cluster-based nanomaterials and provide valuable information for further theoretical and experimental studies.
文摘The advantages and disadvantages of genetic algorithm and BP algorithm are introduced. A neural network based on GA-BP algorithm is proposed and applied in the prediction of protein secondary structure, which combines the advantages of BP and GA. The prediction and training on the neural network are made respectively based on 4 structure classifications of protein so as to get higher rate of predication---the highest prediction rate 75.65%,the average prediction rate 65.04%.
文摘Phytase is a kind of enzyme that hydrolyzes phytic acid and its salts to produce inositol and phosphoric acid. As a new feed additive, phytase has great potential in animal nutrition and environmental protection. Because of its good stability, large-scale production and high activity, microbial phytase has become a hot spot in industrial application. Here, we reported the predicted structure and enzymatic properties of a phytase from Bacillus subtilis, which was named as phyS. It was clear that the optimal temperature is 35°C, and the optimal pH is 8. Meanwhile, the enzyme activity was kept at above 90% in the range of pH 8 - 9, this result demonstrated that phyS is an alkaline phytase. This study lays a foundation for the extensive application of phyS.
基金Supported by the National Natural Science Foundation of China,No.81774093,No.81904009,No.81974546 and No.82174182Key R&D Project of Hubei Province,No.2020BCB001.
文摘BACKGROUND Circular RNAs(circRNAs)are involved in the pathogenesis of many diseases through competing endogenous RNA(ceRNA)regulatory mechanisms.AIM To investigate a circRNA-related ceRNA regulatory network and a new predictive model by circRNA to understand the diagnostic mechanism of circRNAs in ulcerative colitis(UC).METHODS We obtained gene expression profiles of circRNAs,miRNAs,and mRNAs in UC from the Gene Expression Omnibus dataset.The circRNA-miRNA-mRNA network was constructed based on circRNA-miRNA and miRNA-mRNA interactions.Functional enrichment analysis was performed to identify the biological mechanisms involved in circRNAs.We identified the most relevant differential circRNAs for diagnosing UC and constructed a new predictive nomogram,whose efficacy was tested with the C-index,receiver operating characteristic curve(ROC),and decision curve analysis(DCA).RESULTS A circRNA-miRNA-mRNA regulatory network was obtained,containing 12 circRNAs,three miRNAs,and 38 mRNAs.Two optimal prognostic-related differentially expressed circRNAs,hsa_circ_0085323 and hsa_circ_0036906,were included to construct a predictive nomogram.The model showed good discrimination,with a C-index of 1(>0.9,high accuracy).ROC and DCA suggested that the nomogram had a beneficial diagnostic ability.CONCLUSION This novel predictive nomogram incorporating hsa_circ_0085323 and hsa_circ_0036906 can be conveniently used to predict the risk of UC.The circRNa-miRNA-mRNA network in UC could be more clinically significant.
基金Supported by National Nature Science Foundation of China(No:30860070)
文摘Objective:To identify full length cDNA sequence of lactate dehydrogenase(LDH) from adult Echinococcus granulosus(E.granulosus) and to predict the structure and function of its encoding protein using bioinformatics methods.Methods:With the help of NCBI,EMBI, Expasy and other online sites,the open reading frame(ORF),conserved domain,physical and chemical parameters,signal peptide,epitope,topological structures of the protein sequences were predicted and a homology tertiary structure model was created:Vector NT1 software was used for sequence alignment,phylogenetic tree construction and tertiary structure prediction. Results:The target sequence was 1 233 bp length with a 996 bp biggest ORF encoding 331 amino acids protein with typical L-LDH conserved domain.It was confirmed as full length cDNA of LDH from E.granulosus and named as EgLDH(GenBank accession number:HM748917).The predicted molecular weight and isoelectric point of the deduced protein were 3 5516.2Da and 6.32 respectively.Compared with LDHs from Taenia solium,Taenia saginata asiatica,Spirometra erinaceieuropaei.Schistosoma japonicum,Clonorchis sinensis and human,it showed similarity of 86% ,85% ,55% ,58% ,58% and 53% ,respectively.EgLDH contained 3 putative transmembrane regions and 4 major epitopes(54aa-59aa.81aa-87aa,97aa-102aa,307aa-313aa),the latter were significant different from the corresponding regions of human LDH.In addition,some NAD and substrate binding sites located on epitopes 54aa-59aa and 97aa-102aa,respectively.Tertiary structure prediction showed that 3 key catalytic residues 105R,165D and 192H forming a catalytic center near the epitope 97aa-102aa,most NAD and substrate binding sites located around the center.Conclusions:The full length cDNA sequences of EgLDH were identified.It encoded a putative transmembrane protein which might be an ideal target molecule for vaccine and drugs.
基金supported by the National Natural Science Foundation of China(Grant No.32000462 to Fei Qi,Grant No.32170619 to Philipp Kapranovand Grant No.32201055 to Yue Chen)+2 种基金the Research Fund for International Senior Scientists from the National Natural Science Foundation of China(Grant No.32150710525 to Philipp Kapranov)the Natural Science Foundation of Fujian Province,China(Grant No.2020J02006 to Philipp Kapranov)the Scientific Research Funds of Huaqiao University,China(Grant No.22BS114 to Fei Qi,Grant No.21BS127 to Yue Chen,and Grant No.15BS101 to Philipp Kapranov).
文摘Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes.Thus,a plethora of approaches have been developed to predict,identify,or solve RNA structures based on various computational,molecular,genetic,chemical,or physicochemical strategies.Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation,time,speed,cost,and throughput,but they strongly underperform in terms of accuracy that significantly limits their broader application.Nonetheless,the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs.Here,we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity.We found that while many programs performed well in relatively simple tasks,their performance varied significantly in more complex RNA folding problems.However,in general,a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures,at least based on the specific class of sequences tested,suggesting that it may represent the future of RNA structure prediction algorithms.