Biological functions of proteins play a key role in the development of any organism. The gene tbx 22 is a member of a phylogenetically conserved family of genes, which share a common DNA binding domain: T box. This st...Biological functions of proteins play a key role in the development of any organism. The gene tbx 22 is a member of a phylogenetically conserved family of genes, which share a common DNA binding domain: T box. This study examines the similarity in the developmental pattern influenced by the transcription factor TBX22 and tbx22 in H. sapiens and D. rerio respectively. Secondary and tertiary structures of the proteins are predicted using standard structure prediction software’s like Phyre 2, Predict Protein, SWISSMODEL, PSIPRED and the homology of the proteins were compared to each other. Protein homology prediction shows more than 65% between the 2 organisms. Superimposing the predicted protein structures reveals conserved domains between the human and zebrafish proteins. Additional supporting data from Genomatix MATBASE, MATINSPECTOR show higher matrix family scores for BRAC (Brachury gene mesoderm developmental factor) in Human and Zebrafish. Transcription factor and promoter element analysis with Transcriptome Viewer, Gene 2 Promoter and Genomeinspector reveal a high degree of homology between the 2 organisms. Bioinformatic-Proteomics and protein structural analysis approaches shown here explain in detail the relationship between the Human and Zebrafish tbx22 Gene-Protein-Transcrip- tion factor. These studies also support zebrafish as a predictive model for numerous developmental pattering events in higher vertebrates.展开更多
Protein–protein interactions (PPI) are important for many biological processes. Theoretical understanding of the structurally determining factors of interaction sites will help to understand the underlying mechanism ...Protein–protein interactions (PPI) are important for many biological processes. Theoretical understanding of the structurally determining factors of interaction sites will help to understand the underlying mechanism of protein–protein interactions. At the same time, understanding the complex structure of proteins helps to explore their function. And accurately predicting protein complexes from PPI networks helps us understand the relationship between proteins. In the past few decades, scholars have proposed many methods for predicting protein interactions and protein complex structures. In this review, we first briefly introduce the methods and servers for predicting protein interaction sites and interface residue pairs, and then introduce the protein complex structure prediction methods including template-based prediction and template-free prediction. Subsequently, this paper introduces the methods of predicting protein complexes from the PPI network and the method of predicting missing links in the PPI network. Finally, it briefly summarizes the application of machine/deep learning models in protein structure prediction and action site prediction.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
Based on the concept of ant colony optimization and the idea of population in genetic algorithm, a novel global optimization algorithm, called the hybrid ant colony optimization (HACO), is proposed in this paper to ...Based on the concept of ant colony optimization and the idea of population in genetic algorithm, a novel global optimization algorithm, called the hybrid ant colony optimization (HACO), is proposed in this paper to tackle continuous-space optimization problems. It was compared with other well-known stochastic methods in the optimization of the benchmark functions and was also used to solve the problem of selecting appropriate dilation efficiently by optimizing the wavelet power spectrum of the hydrophobic sequence of protein, which is the key step on using continuous wavelet transform (CWT) to predict a-helices and connecting peptides.展开更多
In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using ...In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.展开更多
The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the &...The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure p...The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.展开更多
A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physic...A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.展开更多
The deep-learning protein structure prediction method AlphaFold2 has garnered enormous attention beyond the realm of structural biology,for its groundbreaking contribution to solving the"protein foiding problem&q...The deep-learning protein structure prediction method AlphaFold2 has garnered enormous attention beyond the realm of structural biology,for its groundbreaking contribution to solving the"protein foiding problem"In this perspective,we explore the connection between protein structure studies and environmental research,delving into the potential for addressing specific environmental challenges.Proteins are promising for environmental applications because of the functional diversity endowed by their structural complexity.However,structural studies on proteins with environmental significance remain scarce.Here,we present the opportunity to study proteins by advancing experimental determination and deep-learning prediction methods.Specifically,the latest progress in environmental research via cryogenic electron microscopy is highlighted.It allows us to determine the structure of protein complexes in their native state within cells at molecular resolution,revealing environmentally-associated structural dynamics.With the remarkable advancements in computational power and experimental resolution,the study of protein structure and dynamics has reached unprecedented depth and accuracy.These advancements will undoubtedly accelerate the establishment of comprehensive environmental protein structural and functional databases.Tremendous opportunities for protein engineering exist to enable innovative solutions for environmental applications,such as the degradation of persistent contaminants,and the recovery of valuable metals as well as rare earth elements.展开更多
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt ...Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.展开更多
Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investig...Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.展开更多
Subcellular localization of proteins can provide key hints to infer their functions and structures in cells. With the breakthrough of recent molecule imaging techniques, the usage of 2D bioimages has become increasing...Subcellular localization of proteins can provide key hints to infer their functions and structures in cells. With the breakthrough of recent molecule imaging techniques, the usage of 2D bioimages has become increasingly popular in automatically analyzing the protein subcellular location pat- terns. Compared with the widely used protein 1D amino acid sequence data, the images of protein distribution are more intuitive and interpretable, making the images a better choice at many applications for revealing the dynamic char- acteristics of proteins, such as detecting protein translocation and quantification of proteins. In this paper, we systemati- cally reviewed the recent progresses in the field of automated image-based protein subcellular location prediction, and clas- sified them into four categories including growing of bioim- age databases, description of subcellular location distribution patterns, classification methods, and applications of the pre- diction systems. Besides, we also discussed some potential directions in this field.展开更多
The number of available protein sequences in public databases is increasing exponentially.However,a sig-nificant percentage of these sequences lack functional annotation,which is essential for the understanding of how...The number of available protein sequences in public databases is increasing exponentially.However,a sig-nificant percentage of these sequences lack functional annotation,which is essential for the understanding of how bio-logical systems operate.Here,we propose a novel method,Quantitative Annotation of Unknown STructure(QAUST),to infer protein functions,specifically Gene Ontology(GO)terms and Enzyme Commission(EC)numbers.QAUST uses three sources of information:structure information encoded by global and local structure similarity search,biological network information inferred by protein–protein interaction data,and sequence information extracted from functionally discriminative sequence motifs.These three pieces of information are combined by consensus averaging to make the final prediction.Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation(CAFA)benchmark set.The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading.We further demonstrate that a previously unknown function of human tripartite motif-containing 22(TRIM22)protein predicted by QAUST can be experimentally validated.展开更多
Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kerne...Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.展开更多
The HP model for protein structure prediction abstracts the fact that hydrophobicity is a dominant force in the protein folding process. This challenging combinatorial optimization problem has been widely addressed th...The HP model for protein structure prediction abstracts the fact that hydrophobicity is a dominant force in the protein folding process. This challenging combinatorial optimization problem has been widely addressed through metaheuristics. The evaluation function is a key component for the success of metaheuristics; the poor discrimination of the conventional evaluation function of the HP model has motivated the proposal of alternative formulations for this component. This comparative analysis inquires into the effectiveness of seven different evaluation functions for the HP model. The degree of discrimination provided by each of the studied functions, their capability to preserve a rank ordering among potential solutions which is consistent with the original objective of the HP model, as well as their effect on the performance of local search methods are analyzed. The obtained results indicate that studying alternative evaluation schemes for the HP model represents a highly valuable direction which merits more attention.展开更多
The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segm...The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segment of an antigen, has a folding motif of α-helix, whereas P2, which is derived by deleting four residues AWNR from peptide P1, prevents the formation of helix and presents a β-strand. And peptlde P1 experiences a more rugged energy landscape than peptide P2. From our results, it is inferred that the antibody CD8 cytolytic T lymphocyte prefers an antigen with a β-folding structure to that with an α-helical one.展开更多
As one of the state-of-the-art automated function prediction(AFP)methods,NetGO 2.0 integrates multi-source information to improve the performance.However,it mainly utilizes the proteins with experimentally supported f...As one of the state-of-the-art automated function prediction(AFP)methods,NetGO 2.0 integrates multi-source information to improve the performance.However,it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins.Recently,protein language models have been proposed to learn informative representations[e.g.,Evolutionary Scale Modeling(ESM)-1b embedding] from protein sequences based on self-supervision.Here,we represented each protein by ESM-1b and used logistic regression(LR)to train a new model,LR-ESM,for AFP.The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0.Therefore,by incorporating LR-ESM into NetGO 2.0,we developed NetGO 3.0 to improve the performance of AFP extensively.展开更多
The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial ...The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.展开更多
In the post-genomic era, various computational methods that predict proteinprotein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false pred...In the post-genomic era, various computational methods that predict proteinprotein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false predictions. Here we developed a unique integrated approach to identify interacting partner(s) of Semaphorin 5A (SEMA5A), beginning with seven proteins sharing similar ligand interacting residues as putative binding partners. The methods include Dwyer and Root- Bernstein/Dillon theories of protein evolution, hydropathic complementarity of protein structure, pattern of protein functions among molecules, information on domain-domain interactions, co-expression of genes and protein evolution. Among the set of seven proteins selected as putative SEMA5A interacting partners, we found the functions of Plexin B3 and Neuropilin-2 to be associated with SEMA5A. We modeled the semaphorin domain structure of Plexin B3 and found that it shares similarity with SEMA5A. Moreover, a virtual expression database search and RT-PCR analysis showed co-expression of SEMA5A and Plexin B3 and these proteins were found to have co-evolved. In addition, we confirmed the interaction of SEMA5A with Plexin B3 in co-immunoprecipitation studies. Overall, these studies demonstrate that an integrated method of prediction can be used at the genome level for discovering many unknown protein binding partners with known ligand binding domains.展开更多
文摘Biological functions of proteins play a key role in the development of any organism. The gene tbx 22 is a member of a phylogenetically conserved family of genes, which share a common DNA binding domain: T box. This study examines the similarity in the developmental pattern influenced by the transcription factor TBX22 and tbx22 in H. sapiens and D. rerio respectively. Secondary and tertiary structures of the proteins are predicted using standard structure prediction software’s like Phyre 2, Predict Protein, SWISSMODEL, PSIPRED and the homology of the proteins were compared to each other. Protein homology prediction shows more than 65% between the 2 organisms. Superimposing the predicted protein structures reveals conserved domains between the human and zebrafish proteins. Additional supporting data from Genomatix MATBASE, MATINSPECTOR show higher matrix family scores for BRAC (Brachury gene mesoderm developmental factor) in Human and Zebrafish. Transcription factor and promoter element analysis with Transcriptome Viewer, Gene 2 Promoter and Genomeinspector reveal a high degree of homology between the 2 organisms. Bioinformatic-Proteomics and protein structural analysis approaches shown here explain in detail the relationship between the Human and Zebrafish tbx22 Gene-Protein-Transcrip- tion factor. These studies also support zebrafish as a predictive model for numerous developmental pattering events in higher vertebrates.
基金Project supported by the National Natural Science Foundation of China (Grant No. 31670725)。
文摘Protein–protein interactions (PPI) are important for many biological processes. Theoretical understanding of the structurally determining factors of interaction sites will help to understand the underlying mechanism of protein–protein interactions. At the same time, understanding the complex structure of proteins helps to explore their function. And accurately predicting protein complexes from PPI networks helps us understand the relationship between proteins. In the past few decades, scholars have proposed many methods for predicting protein interactions and protein complex structures. In this review, we first briefly introduce the methods and servers for predicting protein interaction sites and interface residue pairs, and then introduce the protein complex structure prediction methods including template-based prediction and template-free prediction. Subsequently, this paper introduces the methods of predicting protein complexes from the PPI network and the method of predicting missing links in the PPI network. Finally, it briefly summarizes the application of machine/deep learning models in protein structure prediction and action site prediction.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金the National Natural Science Foundation of China(No.20475068) the Guangdong Provincial Natural Science Foundation(No.031577).
文摘Based on the concept of ant colony optimization and the idea of population in genetic algorithm, a novel global optimization algorithm, called the hybrid ant colony optimization (HACO), is proposed in this paper to tackle continuous-space optimization problems. It was compared with other well-known stochastic methods in the optimization of the benchmark functions and was also used to solve the problem of selecting appropriate dilation efficiently by optimizing the wavelet power spectrum of the hydrophobic sequence of protein, which is the key step on using continuous wavelet transform (CWT) to predict a-helices and connecting peptides.
基金Supported by the National Natural Science Foundation of China(60133010,70071042,60073043)
文摘In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.
文摘The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
文摘The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.
基金The National Natural Science Founda-tion of China (No.10471051) and the National Basic Research Program (973) of China (No.2004CB318000)
文摘A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.
基金Financial support from the National Natural Science Foundation of China(Grant Nos.52225001 and 51978485)the State Key Laboratory for Pollution Control(China)is acknowledged.
文摘The deep-learning protein structure prediction method AlphaFold2 has garnered enormous attention beyond the realm of structural biology,for its groundbreaking contribution to solving the"protein foiding problem"In this perspective,we explore the connection between protein structure studies and environmental research,delving into the potential for addressing specific environmental challenges.Proteins are promising for environmental applications because of the functional diversity endowed by their structural complexity.However,structural studies on proteins with environmental significance remain scarce.Here,we present the opportunity to study proteins by advancing experimental determination and deep-learning prediction methods.Specifically,the latest progress in environmental research via cryogenic electron microscopy is highlighted.It allows us to determine the structure of protein complexes in their native state within cells at molecular resolution,revealing environmentally-associated structural dynamics.With the remarkable advancements in computational power and experimental resolution,the study of protein structure and dynamics has reached unprecedented depth and accuracy.These advancements will undoubtedly accelerate the establishment of comprehensive environmental protein structural and functional databases.Tremendous opportunities for protein engineering exist to enable innovative solutions for environmental applications,such as the degradation of persistent contaminants,and the recovery of valuable metals as well as rare earth elements.
基金the National Key R&D Program of China(Grant No.2020YFA0907000)lthe National Natural Science Foundation of China(Grant Nos.32271297,62072435,31770775,and 31671369)for providing financial support for this study and publication charges.
文摘Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.
基金supported in part by the National Natural Science Foundation of China(22033001)the National Key R&D Program of China(2022YFA1303700)the Chinese Academy of Medical Sciences(2021-I2M-5-014).
文摘Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.
文摘Subcellular localization of proteins can provide key hints to infer their functions and structures in cells. With the breakthrough of recent molecule imaging techniques, the usage of 2D bioimages has become increasingly popular in automatically analyzing the protein subcellular location pat- terns. Compared with the widely used protein 1D amino acid sequence data, the images of protein distribution are more intuitive and interpretable, making the images a better choice at many applications for revealing the dynamic char- acteristics of proteins, such as detecting protein translocation and quantification of proteins. In this paper, we systemati- cally reviewed the recent progresses in the field of automated image-based protein subcellular location prediction, and clas- sified them into four categories including growing of bioim- age databases, description of subcellular location distribution patterns, classification methods, and applications of the pre- diction systems. Besides, we also discussed some potential directions in this field.
基金supported by the King Abdullah University of Science and Technology(KAUST)Office of Sponsored Research(OSR)(Grant Nos.URF/1/1976-04,URF/1/1976-06)。
文摘The number of available protein sequences in public databases is increasing exponentially.However,a sig-nificant percentage of these sequences lack functional annotation,which is essential for the understanding of how bio-logical systems operate.Here,we propose a novel method,Quantitative Annotation of Unknown STructure(QAUST),to infer protein functions,specifically Gene Ontology(GO)terms and Enzyme Commission(EC)numbers.QAUST uses three sources of information:structure information encoded by global and local structure similarity search,biological network information inferred by protein–protein interaction data,and sequence information extracted from functionally discriminative sequence motifs.These three pieces of information are combined by consensus averaging to make the final prediction.Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation(CAFA)benchmark set.The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading.We further demonstrate that a previously unknown function of human tripartite motif-containing 22(TRIM22)protein predicted by QAUST can be experimentally validated.
基金This research is supported in part by HKRGC Grant 7017/07P, HKU CRCG Grants, HKU strategic theme grant on computational sciences, HKU Hung Hing Ying Physical Science Research Grant, National Natural Science Foundation of China Grant No. 10971075 and Guangdong Provincial Natural Science Grant No. 9151063101000021. The preliminary version of this paper has been presented in the OSB2009 conference and published in the corresponding conference proceedings[25]. The authors would like to thank the anonymous referees for their helpful comments and suggestions.
文摘Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.
基金partially supported by the National Council of Science and Technology of México (CO NACyT) under Grant Nos. 105060 and 99276
文摘The HP model for protein structure prediction abstracts the fact that hydrophobicity is a dominant force in the protein folding process. This challenging combinatorial optimization problem has been widely addressed through metaheuristics. The evaluation function is a key component for the success of metaheuristics; the poor discrimination of the conventional evaluation function of the HP model has motivated the proposal of alternative formulations for this component. This comparative analysis inquires into the effectiveness of seven different evaluation functions for the HP model. The degree of discrimination provided by each of the studied functions, their capability to preserve a rank ordering among potential solutions which is consistent with the original objective of the HP model, as well as their effect on the performance of local search methods are analyzed. The obtained results indicate that studying alternative evaluation schemes for the HP model represents a highly valuable direction which merits more attention.
基金Project supported by the National Natural Science Foundation of China (Grant Nos 90103031, 10474041, 90403120 and 10021001), and the Nonlinear Project (973) of the NSM.
文摘The folding dynamics and structural characteristics of peptides RTKAWNRQLYPEW (P1) and RTKQLYPEW (P2) are investigated by using all-atomic simulation procedure CHARMM in this work. The results show that P1, a segment of an antigen, has a folding motif of α-helix, whereas P2, which is derived by deleting four residues AWNR from peptide P1, prevents the formation of helix and presents a β-strand. And peptlde P1 experiences a more rugged energy landscape than peptide P2. From our results, it is inferred that the antibody CD8 cytolytic T lymphocyte prefers an antigen with a β-folding structure to that with an α-helical one.
基金supported by the National Natural Science Foundation of China(Grant Nos.61872094 and 62272105)the Shanghai Municipal Science and Technology Major Project(Grant No.2018SHZDZX01)+2 种基金the ZJ Lab,and the Shanghai Research Center for Brain Science and Brain-Inspired Intelligence Technology.Shaojun Wang and Ronghui You have been supported by the lll Project(Grant No.B18015)the Shanghai Municipal Science and Technology Major Project(Grant No.2017SHZDZX01)the Information Technology Facility,CAS-MPG Partner Institute for Computational Biology,Shanghai Institute for Biological Sciences,Chinese Academy of Sciences.Yi Xiong has been supported by the National Natural Science Foundation of China(Grant Nos.61832019 and 62172274).
文摘As one of the state-of-the-art automated function prediction(AFP)methods,NetGO 2.0 integrates multi-source information to improve the performance.However,it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins.Recently,protein language models have been proposed to learn informative representations[e.g.,Evolutionary Scale Modeling(ESM)-1b embedding] from protein sequences based on self-supervision.Here,we represented each protein by ESM-1b and used logistic regression(LR)to train a new model,LR-ESM,for AFP.The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0.Therefore,by incorporating LR-ESM into NetGO 2.0,we developed NetGO 3.0 to improve the performance of AFP extensively.
基金supported by the National Natural Science Foundation of China(No.90203011 and 30370354)the Ministry of Education of China(No.505010 and CG2003-GA002)
文摘The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.
基金This work was partly supported by Molecular Therapeutics Program,Nebraska Department of Health and Human Services and by Grant CA72781 (to RKS)Cancer Center Support Grant (P30CA036727) from National Cancer Institute,National Institutes of Health,USA.
文摘In the post-genomic era, various computational methods that predict proteinprotein interactions at the genome level are available; however, each method has its own advantages and disadvantages, resulting in false predictions. Here we developed a unique integrated approach to identify interacting partner(s) of Semaphorin 5A (SEMA5A), beginning with seven proteins sharing similar ligand interacting residues as putative binding partners. The methods include Dwyer and Root- Bernstein/Dillon theories of protein evolution, hydropathic complementarity of protein structure, pattern of protein functions among molecules, information on domain-domain interactions, co-expression of genes and protein evolution. Among the set of seven proteins selected as putative SEMA5A interacting partners, we found the functions of Plexin B3 and Neuropilin-2 to be associated with SEMA5A. We modeled the semaphorin domain structure of Plexin B3 and found that it shares similarity with SEMA5A. Moreover, a virtual expression database search and RT-PCR analysis showed co-expression of SEMA5A and Plexin B3 and these proteins were found to have co-evolved. In addition, we confirmed the interaction of SEMA5A with Plexin B3 in co-immunoprecipitation studies. Overall, these studies demonstrate that an integrated method of prediction can be used at the genome level for discovering many unknown protein binding partners with known ligand binding domains.