Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some st...Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic...The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot o...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using ...In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physic...A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.展开更多
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt ...Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.展开更多
Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein...Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein sequence deduced from the cDNA sequence obtained using RT-PCR amplification was identified as tannase through sequence alignment and phylogenetic analysis.Structure models based on the tannase sequence were collected using I-TASSER,and the model with the best match to the surface charge density-pH titration profile was selected as the final structure for tannase from Aspergillusniger N5-5.This work provides an effective method for protein structure research.The structure constructed in this work should be very important to understand the enzyme bioactivities and further developments of fungi tannases.展开更多
Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local...Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with the system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching by automatic decomposition methods. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals, and the decomposition can dramatically reduce the searching space. Sufficient examples for test, including the high-pressure phases of methane, ammonia, MgAl2O4 and boron, show that these new evolution schemes can significantly improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.展开更多
RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, an...RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.展开更多
Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this ...Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this paper,we established a data set of carbon materials by high-throughput calculation with available carbon structures obtained from the Samara Carbon Allotrope Database.We then trained a machine learning(ML)model that specifically predicts the elastic modulus(bulk modulus,shear modulus,and the Young's modulus)and confirmed that the accuracy is better than that of AFLOW-ML in predicting the elastic modulus of a carbon allotrope.We further combined our ML model with the CALYPSO code to search for new carbon structures with a high Young's modulus.A new carbon allotrope not included in the Samara Carbon Allotrope Database,named Cmcm-C24,which exhibits a hardness greater than 80 GPa,was firstly revealed.The Cmcm-C24 phase was identified as a semiconductor with a direct bandgap.The structural stability,elastic modulus,and electronic properties of the new carbon allotrope were systematically studied,and the obtained results demonstrate the feasibility of ML methods accelerating the material search process.展开更多
There is a large gap between the number of membrane protein (MP) sequencesand that of their decoded 3D structures, especially high-resolution structures, due to difficultiesin crystal preparation of MPs. However, deta...There is a large gap between the number of membrane protein (MP) sequencesand that of their decoded 3D structures, especially high-resolution structures, due to difficultiesin crystal preparation of MPs. However, detailed knowledge of the 3D structure is required for thefundamental understanding of the function of an MP and the interactions between the protein and itsinhibitors or activators. In this paper, some computational approaches that have been used topredict MP structures are discussed and compared.展开更多
The HP model for protein structure prediction abstracts the fact that hydrophobicity is a dominant force in the protein folding process. This challenging combinatorial optimization problem has been widely addressed th...The HP model for protein structure prediction abstracts the fact that hydrophobicity is a dominant force in the protein folding process. This challenging combinatorial optimization problem has been widely addressed through metaheuristics. The evaluation function is a key component for the success of metaheuristics; the poor discrimination of the conventional evaluation function of the HP model has motivated the proposal of alternative formulations for this component. This comparative analysis inquires into the effectiveness of seven different evaluation functions for the HP model. The degree of discrimination provided by each of the studied functions, their capability to preserve a rank ordering among potential solutions which is consistent with the original objective of the HP model, as well as their effect on the performance of local search methods are analyzed. The obtained results indicate that studying alternative evaluation schemes for the HP model represents a highly valuable direction which merits more attention.展开更多
Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natur...Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.展开更多
Based on the concept of ant colony optimization and the idea of population in genetic algorithm, a novel global optimization algorithm, called the hybrid ant colony optimization (HACO), is proposed in this paper to ...Based on the concept of ant colony optimization and the idea of population in genetic algorithm, a novel global optimization algorithm, called the hybrid ant colony optimization (HACO), is proposed in this paper to tackle continuous-space optimization problems. It was compared with other well-known stochastic methods in the optimization of the benchmark functions and was also used to solve the problem of selecting appropriate dilation efficiently by optimizing the wavelet power spectrum of the hydrophobic sequence of protein, which is the key step on using continuous wavelet transform (CWT) to predict a-helices and connecting peptides.展开更多
The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure p...The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.展开更多
Inverse materials design tackles the challenge of finding materials with desired properties, tailored to specific applications, by combining atomistic simulations and optimization methods. The search for optimal mater...Inverse materials design tackles the challenge of finding materials with desired properties, tailored to specific applications, by combining atomistic simulations and optimization methods. The search for optimal materials requires one to survey large spaces of candidate solids. These spaces of materials can encompass both known and hypothetical compounds. When hypothetical compounds are explored, it becomes crucial to determine which ones are stable(and can be synthesized) and which are not. Crystal structure prediction is a necessary step for assessing theoretically the stability of a hypothetical material and, therefore, is a crucial step in inverse materials design protocols. Here, we describe how biologically-inspired global optimization methods can efficiently predict the stable crystal structure of solids. Specifically,we discuss the application of genetic algorithms to search for optimal atom configurations in systems in which the underlying lattice is given,and of evolutionary algorithms to address the general lattice-type prediction problem.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
基金supported by the National Natural Science Foundation of China(Grant Nos.11074191,11175132,and 11374234)the National Basic Research Programof China(Grant No.2011CB933600)the Program for New Century Excellent Talents of China(Grant No.NCET 08-0408)
文摘Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
基金This work was supported by the National Science Foundation(Grant DMR-9812351)
文摘The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
基金Supported by the National Natural Science Foundation of China(60133010,70071042,60073043)
文摘In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
基金The National Natural Science Founda-tion of China (No.10471051) and the National Basic Research Program (973) of China (No.2004CB318000)
文摘A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.
基金the National Key R&D Program of China(Grant No.2020YFA0907000)lthe National Natural Science Foundation of China(Grant Nos.32271297,62072435,31770775,and 31671369)for providing financial support for this study and publication charges.
文摘Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.
基金the National Natural Science Foundation of China (No. 21374117)the 100 Talents Program of Chinese Academy of Sciences for financial support
文摘Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein sequence deduced from the cDNA sequence obtained using RT-PCR amplification was identified as tannase through sequence alignment and phylogenetic analysis.Structure models based on the tannase sequence were collected using I-TASSER,and the model with the best match to the surface charge density-pH titration profile was selected as the final structure for tannase from Aspergillusniger N5-5.This work provides an effective method for protein structure research.The structure constructed in this work should be very important to understand the enzyme bioactivities and further developments of fungi tannases.
基金support from the National Natural Science Foundation of China (Grant Nos. 11974162 and 11834006)the National Key R&D Program of China (Grant Nos. 2016YFA0300404)the Fundamental Research Funds for the Central Universities.
文摘Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with the system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching by automatic decomposition methods. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals, and the decomposition can dramatically reduce the searching space. Sufficient examples for test, including the high-pressure phases of methane, ammonia, MgAl2O4 and boron, show that these new evolution schemes can significantly improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 11774158, 11974173, 11774157, and 11934008)。
文摘RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.
基金This work was financlally supported by the Fundamental Research Funds for the Central Universities,the Na-tional Natural Science Foundation of China(Grant Nos.11965005 and 11964026)the 111 Project(No.B17035)the Natural Sci-ence Basie Research plan in Shaanxi Province of China(Grant Nos.2020JM-186 and 2020JM-621).
文摘Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this paper,we established a data set of carbon materials by high-throughput calculation with available carbon structures obtained from the Samara Carbon Allotrope Database.We then trained a machine learning(ML)model that specifically predicts the elastic modulus(bulk modulus,shear modulus,and the Young's modulus)and confirmed that the accuracy is better than that of AFLOW-ML in predicting the elastic modulus of a carbon allotrope.We further combined our ML model with the CALYPSO code to search for new carbon structures with a high Young's modulus.A new carbon allotrope not included in the Samara Carbon Allotrope Database,named Cmcm-C24,which exhibits a hardness greater than 80 GPa,was firstly revealed.The Cmcm-C24 phase was identified as a semiconductor with a direct bandgap.The structural stability,elastic modulus,and electronic properties of the new carbon allotrope were systematically studied,and the obtained results demonstrate the feasibility of ML methods accelerating the material search process.
文摘There is a large gap between the number of membrane protein (MP) sequencesand that of their decoded 3D structures, especially high-resolution structures, due to difficultiesin crystal preparation of MPs. However, detailed knowledge of the 3D structure is required for thefundamental understanding of the function of an MP and the interactions between the protein and itsinhibitors or activators. In this paper, some computational approaches that have been used topredict MP structures are discussed and compared.
基金partially supported by the National Council of Science and Technology of México (CO NACyT) under Grant Nos. 105060 and 99276
文摘The HP model for protein structure prediction abstracts the fact that hydrophobicity is a dominant force in the protein folding process. This challenging combinatorial optimization problem has been widely addressed through metaheuristics. The evaluation function is a key component for the success of metaheuristics; the poor discrimination of the conventional evaluation function of the HP model has motivated the proposal of alternative formulations for this component. This comparative analysis inquires into the effectiveness of seven different evaluation functions for the HP model. The degree of discrimination provided by each of the studied functions, their capability to preserve a rank ordering among potential solutions which is consistent with the original objective of the HP model, as well as their effect on the performance of local search methods are analyzed. The obtained results indicate that studying alternative evaluation schemes for the HP model represents a highly valuable direction which merits more attention.
基金financially supported by the National Natural Science Foundations of China (Nos. 41831288 and51672257)the Fundamental Research Funds for the Central Universities (Nos. 2652018305 and 2652017335)+3 种基金Guangdong Innovation Research Team for Higher Education (No. 2017KCXTD030)the High-Level Talents Project of Dongguan University of Technology (No. KCYKYQD2017017)Engineering Research Center of None-food Biomass Efficient Pyrolysis and Utilization Technology of Guangdong Higher Education Institutes (No. 2016GCZX009)Russian Science Foundation (No. 19-77-10013)。
文摘Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.
基金the National Natural Science Foundation of China(No.20475068) the Guangdong Provincial Natural Science Foundation(No.031577).
文摘Based on the concept of ant colony optimization and the idea of population in genetic algorithm, a novel global optimization algorithm, called the hybrid ant colony optimization (HACO), is proposed in this paper to tackle continuous-space optimization problems. It was compared with other well-known stochastic methods in the optimization of the benchmark functions and was also used to solve the problem of selecting appropriate dilation efficiently by optimizing the wavelet power spectrum of the hydrophobic sequence of protein, which is the key step on using continuous wavelet transform (CWT) to predict a-helices and connecting peptides.
文摘The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.
文摘Inverse materials design tackles the challenge of finding materials with desired properties, tailored to specific applications, by combining atomistic simulations and optimization methods. The search for optimal materials requires one to survey large spaces of candidate solids. These spaces of materials can encompass both known and hypothetical compounds. When hypothetical compounds are explored, it becomes crucial to determine which ones are stable(and can be synthesized) and which are not. Crystal structure prediction is a necessary step for assessing theoretically the stability of a hypothetical material and, therefore, is a crucial step in inverse materials design protocols. Here, we describe how biologically-inspired global optimization methods can efficiently predict the stable crystal structure of solids. Specifically,we discuss the application of genetic algorithms to search for optimal atom configurations in systems in which the underlying lattice is given,and of evolutionary algorithms to address the general lattice-type prediction problem.