Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot o...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt ...Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.展开更多
RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, an...RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.展开更多
Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this ...Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this paper,we established a data set of carbon materials by high-throughput calculation with available carbon structures obtained from the Samara Carbon Allotrope Database.We then trained a machine learning(ML)model that specifically predicts the elastic modulus(bulk modulus,shear modulus,and the Young's modulus)and confirmed that the accuracy is better than that of AFLOW-ML in predicting the elastic modulus of a carbon allotrope.We further combined our ML model with the CALYPSO code to search for new carbon structures with a high Young's modulus.A new carbon allotrope not included in the Samara Carbon Allotrope Database,named Cmcm-C24,which exhibits a hardness greater than 80 GPa,was firstly revealed.The Cmcm-C24 phase was identified as a semiconductor with a direct bandgap.The structural stability,elastic modulus,and electronic properties of the new carbon allotrope were systematically studied,and the obtained results demonstrate the feasibility of ML methods accelerating the material search process.展开更多
Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local...Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with the system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching by automatic decomposition methods. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals, and the decomposition can dramatically reduce the searching space. Sufficient examples for test, including the high-pressure phases of methane, ammonia, MgAl2O4 and boron, show that these new evolution schemes can significantly improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.展开更多
Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natur...Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.展开更多
The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure p...The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.展开更多
As the Welch-Berlekamp (W-B) theorem accurately predicts structure of error locator polynomials of the error patterns, it results in the Welch-Berlekamp algorithm of decoding cyclic codes. However, it is only valid wi...As the Welch-Berlekamp (W-B) theorem accurately predicts structure of error locator polynomials of the error patterns, it results in the Welch-Berlekamp algorithm of decoding cyclic codes. However, it is only valid within the BCH bound. Now, a prediction formula for error locator determination is presented based on the study of theory of minimal homogeneous interpolation problem, which extends the Welch-Berlekamp theorem and expands the Welch-Berlekamp algorithm so that the constraint from the BCH展开更多
Four isomers of the three-dimensionally connected bare boron cationic cluster B were investigated by using ab initio molecular orbital theory at the HF/6-31G level. The results show that the D5h symmetric isomer of B ...Four isomers of the three-dimensionally connected bare boron cationic cluster B were investigated by using ab initio molecular orbital theory at the HF/6-31G level. The results show that the D5h symmetric isomer of B is a possible isomer candidate of its stable geometries with closed structure.展开更多
RNAs play crucial and versatile roles in cellular biochemical reactions.Since experimental approaches of determining their three-dimensional(3D)structures are costly and less efficient,it is greatly advantageous to de...RNAs play crucial and versatile roles in cellular biochemical reactions.Since experimental approaches of determining their three-dimensional(3D)structures are costly and less efficient,it is greatly advantageous to develop computational methods to predict RNA 3D structures.For these methods,designing a model or scoring function for structure quality assessment is an essential step but this step poses challenges.In this study,we designed and trained a deep learning model to tackle this problem.The model was based on a graph convolutional network(GCN)and named RNAGCN.The model provided a natural way of representing RNA structures,avoided complex algorithms to preserve atomic rotational equivalence,and was capable of extracting features automatically out of structural patterns.Testing results on two datasets convincingly demonstrated that RNAGCN performs similarly to or better than four leading scoring functions.Our approach provides an alternative way of RNA tertiary structure assessment and may facilitate RNA structure predictions.RNAGCN can be downloaded from https://gitee.com/dcw-RNAGCN/rnagcn.展开更多
Superconductive properties for oxides were predicted by artificial neural network (ANN) method with structural and chemical parameters as inputs. The predicted properties include superconductivity for oxides, distribu...Superconductive properties for oxides were predicted by artificial neural network (ANN) method with structural and chemical parameters as inputs. The predicted properties include superconductivity for oxides, distributed ranges of the superconductive transition temperature (T_c) for complex oxides, and T_c values for cuprate superconductors. The calculated results indicated that the adjusted ANN can be used to predict superconductive properties for unknown oxides.展开更多
The reaction of water and other materials has been the central topic under high-pressure physics research,because the Earth,super-Earth,Uranus,Neptune and other planets contain a great amount of water inside.However,t...The reaction of water and other materials has been the central topic under high-pressure physics research,because the Earth,super-Earth,Uranus,Neptune and other planets contain a great amount of water inside.However,the reaction between star-rich MgO and water under ultra-high pressure remains still poorly understood.Here,using ab initio evolutionary structure prediction researches of the structures of MgO-H_(2)O system at 300 GPa-600 GPa,we find that(MgO)2H_(2)O and MgO(H_(2)O)2 could become stable.The(MgO)2H_(2)O compounds may be an important component of super-Earth and the ice-rock boundary of Uranus and Neptune.Furthermore,it may be the reservoir under high pressure before the forming of the Earth’s core or other super-Earths.The current findings could expand our knowledge and improve our understanding of the evolution and composition of planets.展开更多
Atomization energy(AE)is an important indicator for measuring material stability and reactivity,which refers to the energy change when a polyatomic molecule decomposes into its constituent atoms.Predicting AE based on...Atomization energy(AE)is an important indicator for measuring material stability and reactivity,which refers to the energy change when a polyatomic molecule decomposes into its constituent atoms.Predicting AE based on the structural information of molecules has been a focus of researchers,but existing methods have limitations such as being time-consuming or requiring complex preprocessing and large amounts of training data.Deep learning(DL),a new branch of machine learning(ML),has shown promise in learning internal rules and hierarchical representations of sample data,making it a potential solution for AE prediction.To address this problem,we propose a natural-parameter network(NPN)approach for AE prediction.This method establishes a clearer statistical interpretation of the relationship between the network’s output and the given data.We use the Coulomb matrix(CM)method to represent each compound as a structural information matrix.Furthermore,we also designed an end-to-end predictive model.Experimental results demonstrate that our method achieves excellent performance on the QM7 and BC2P datasets,and the mean absolute error(MAE)obtained on the QM7 test set ranges from 0.2 kcal/mol to 3 kcal/mol.The optimal result of our method is approximately an order of magnitude higher than the accuracy of 3 kcal/mol in published works.Additionally,our approach significantly accelerates the prediction time.Overall,this study presents a promising approach to accelerate the process of predicting structures using DL,and provides a valuable contribution to the field of chemical energy prediction.展开更多
The discovery of novel materials with desired properties is essential to the advancements of energy-related technologies.Despite the rapid development of computational infrastructures and theoretical approaches,progre...The discovery of novel materials with desired properties is essential to the advancements of energy-related technologies.Despite the rapid development of computational infrastructures and theoretical approaches,progress so far has been limited by the empirical and serial nature of experimental work.Fortunately,the situation is changing thanks to the maturation of theoretical tools such as density functional theory,high-throughput screening,crystal structure prediction,and emerging approaches based on machine learning.Together these recent innovations in computational chemistry,data informatics,and machine learning have acted as catalysts for revolutionizing material design and hopefully will lead to faster kinetics in the development of energy-related industries.In this report,recent advances in material discovery methods are reviewed for energy devices.Three paradigms based on empiricism-driven experiments,database-driven high-throughput screening,and data informatics-driven machine learning are discussed critically.Key methodological advancements involved are reviewed including high-throughput screening,crystal structure prediction,and generative models for target material design.Their applications in energy-related devices such as batteries,catalysts,and photovoltaics are selectively showcased.展开更多
As a fundamental thermodynamic variable, pressure can alter the bonding patterns and drive phase transitions leading to the creation of new high-pressure phases with exotic properties that are inaccessible at ambient ...As a fundamental thermodynamic variable, pressure can alter the bonding patterns and drive phase transitions leading to the creation of new high-pressure phases with exotic properties that are inaccessible at ambient pressure. Using the swarm intelligence structural prediction method, the phase transition of TiF_(3), from R-3c to the Pnma phase, was predicted at high pressure, accompanied by the destruction of TiF_6 octahedra and formation of TiF_8 square antiprismatic units. The Pnma phase of TiF_(3), formed using the laser-heated diamond-anvil-cell technique was confirmed via high-pressure x-ray diffraction experiments. Furthermore, the in situ electrical measurements indicate that the newly found Pnma phase has a semiconducting character, which is also consistent with the electronic band structure calculations. Finally, it was shown that this pressure-induced phase transition is a general phenomenon in ScF_(3), VF_(3), CrF_(3), and MnF_(3), offering valuable insights into the high-pressure phases of transition metal trifluorides.展开更多
SiO–based materials are promising alloys and conversion-type anode materials for lithium-ion batteries and are recently found to be excellent dendrite-proof layers for lithium-metal batteries.However,only a small fra...SiO–based materials are promising alloys and conversion-type anode materials for lithium-ion batteries and are recently found to be excellent dendrite-proof layers for lithium-metal batteries.However,only a small fraction of the Li–Si–O compositional space has been reported,significantly impeding the understanding of the phase transition mechanisms and the rational design of these materials both as anodes and as protection layers for lithium-metal anodes.Herein,we identify three new thermodynamically stable phases within the Li–Si–O ternary system(Li_(2)SiO_(5),Li_(4)SiO_(6),and Li_(4)SiO_(8))in addition to the existing records via first-principle calculations.The electronic structure simulation shows that Li_(2)SiO_(5)and Li_(4)SiO_(8)phases are metallic in nature,ensuring high electronic conductivity required as electrodes.Moduli calculations demonstrate that the mechanical strength of Li–Si–O phases is much higher than that of lithium metal.The diffusion barriers of interstitial Li range from 0.1 to 0.6 eV and the interstitial Li hopping serves as the dominating diffusion mechanism in the Li–Si–O ternary systems compared with vacancy diffusion.These findings provide a new strategy for future discovery of improved alloying anodes for lithium-ion batteries and offer important insight towards the understanding of the phase transformation mechanism of alloy-type protection layers on lithium-metal anodes.展开更多
A carbonic anhydrase( CA) transcript was obtained from the Contig library according to the published sequencing information of the buckwheat transcripts. The full length of the CA gene was amplified by reverse transcr...A carbonic anhydrase( CA) transcript was obtained from the Contig library according to the published sequencing information of the buckwheat transcripts. The full length of the CA gene was amplified by reverse transcription PCR( RT-PCR). The bioinformatics analysis showed that the full length of Fs CA1 gene was 1233 bp and open reading frame was 978 bp,and encoding 325 amino acids. The molecular weight was 35. 11 ku and the isoelectric point was 7. 59; there were 9 α helices,6 β folds,many randon coil and extension chain,containing one signal peptide and one transmembrane region,having a 2 amino acid conserved domains with typical beta-type carbonic anhydrase. Subcellular localization showed that the protein is most likely to appear in the chloroplast. The three-dimensional structure model of Fs CA1 was built by homologous modeling method,indicating that the homo-octamer of buckwheat CA and pea CA could match well,so it can be inferred that buckwheat CA is also homo-octamer. Real-time quantitative PCR was used to detect the expression of Fs CA1 in different organs of buckwheat.The results showed that Fs CA1 had the highest expression level in leaves,then in the stems,and the lowest in roots.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金the National Key R&D Program of China(Grant No.2020YFA0907000)lthe National Natural Science Foundation of China(Grant Nos.32271297,62072435,31770775,and 31671369)for providing financial support for this study and publication charges.
文摘Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 11774158, 11974173, 11774157, and 11934008)。
文摘RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.
基金This work was financlally supported by the Fundamental Research Funds for the Central Universities,the Na-tional Natural Science Foundation of China(Grant Nos.11965005 and 11964026)the 111 Project(No.B17035)the Natural Sci-ence Basie Research plan in Shaanxi Province of China(Grant Nos.2020JM-186 and 2020JM-621).
文摘Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this paper,we established a data set of carbon materials by high-throughput calculation with available carbon structures obtained from the Samara Carbon Allotrope Database.We then trained a machine learning(ML)model that specifically predicts the elastic modulus(bulk modulus,shear modulus,and the Young's modulus)and confirmed that the accuracy is better than that of AFLOW-ML in predicting the elastic modulus of a carbon allotrope.We further combined our ML model with the CALYPSO code to search for new carbon structures with a high Young's modulus.A new carbon allotrope not included in the Samara Carbon Allotrope Database,named Cmcm-C24,which exhibits a hardness greater than 80 GPa,was firstly revealed.The Cmcm-C24 phase was identified as a semiconductor with a direct bandgap.The structural stability,elastic modulus,and electronic properties of the new carbon allotrope were systematically studied,and the obtained results demonstrate the feasibility of ML methods accelerating the material search process.
基金support from the National Natural Science Foundation of China (Grant Nos. 11974162 and 11834006)the National Key R&D Program of China (Grant Nos. 2016YFA0300404)the Fundamental Research Funds for the Central Universities.
文摘Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with the system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching by automatic decomposition methods. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals, and the decomposition can dramatically reduce the searching space. Sufficient examples for test, including the high-pressure phases of methane, ammonia, MgAl2O4 and boron, show that these new evolution schemes can significantly improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.
基金financially supported by the National Natural Science Foundations of China (Nos. 41831288 and51672257)the Fundamental Research Funds for the Central Universities (Nos. 2652018305 and 2652017335)+3 种基金Guangdong Innovation Research Team for Higher Education (No. 2017KCXTD030)the High-Level Talents Project of Dongguan University of Technology (No. KCYKYQD2017017)Engineering Research Center of None-food Biomass Efficient Pyrolysis and Utilization Technology of Guangdong Higher Education Institutes (No. 2016GCZX009)Russian Science Foundation (No. 19-77-10013)。
文摘Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.
文摘The secondary structure of a protein is critical for establishing a link between the protein primary and tertiary structures.For this reason,it is important to design methods for accurate protein secondary structure prediction.Most of the existing computational techniques for protein structural and functional prediction are based onmachine learning with shallowframeworks.Different deep learning architectures have already been applied to tackle protein secondary structure prediction problem.In this study,deep learning based models,i.e.,convolutional neural network and long short-term memory for protein secondary structure prediction were proposed.The input to proposed models is amino acid sequences which were derived from CulledPDB dataset.Hyperparameter tuning with cross validation was employed to attain best parameters for the proposed models.The proposed models enables effective processing of amino acids and attain approximately 87.05%and 87.47%Q3 accuracy of protein secondary structure prediction for convolutional neural network and long short-term memory models,respectively.
基金the National Natural Science Foundation of China and the Military Science Foundation in Ministry of Electronic Industry of China.
文摘As the Welch-Berlekamp (W-B) theorem accurately predicts structure of error locator polynomials of the error patterns, it results in the Welch-Berlekamp algorithm of decoding cyclic codes. However, it is only valid within the BCH bound. Now, a prediction formula for error locator determination is presented based on the study of theory of minimal homogeneous interpolation problem, which extends the Welch-Berlekamp theorem and expands the Welch-Berlekamp algorithm so that the constraint from the BCH
文摘Four isomers of the three-dimensionally connected bare boron cationic cluster B were investigated by using ab initio molecular orbital theory at the HF/6-31G level. The results show that the D5h symmetric isomer of B is a possible isomer candidate of its stable geometries with closed structure.
基金funded by the National Natural Science Foundation of China(Grant Nos.11774158 to JZ,11934008 to WW,and 11974173 to WFL)。
文摘RNAs play crucial and versatile roles in cellular biochemical reactions.Since experimental approaches of determining their three-dimensional(3D)structures are costly and less efficient,it is greatly advantageous to develop computational methods to predict RNA 3D structures.For these methods,designing a model or scoring function for structure quality assessment is an essential step but this step poses challenges.In this study,we designed and trained a deep learning model to tackle this problem.The model was based on a graph convolutional network(GCN)and named RNAGCN.The model provided a natural way of representing RNA structures,avoided complex algorithms to preserve atomic rotational equivalence,and was capable of extracting features automatically out of structural patterns.Testing results on two datasets convincingly demonstrated that RNAGCN performs similarly to or better than four leading scoring functions.Our approach provides an alternative way of RNA tertiary structure assessment and may facilitate RNA structure predictions.RNAGCN can be downloaded from https://gitee.com/dcw-RNAGCN/rnagcn.
文摘Superconductive properties for oxides were predicted by artificial neural network (ANN) method with structural and chemical parameters as inputs. The predicted properties include superconductivity for oxides, distributed ranges of the superconductive transition temperature (T_c) for complex oxides, and T_c values for cuprate superconductors. The calculated results indicated that the adjusted ANN can be used to predict superconductive properties for unknown oxides.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.12204280 and 12147135)the Natural Science Foundation of Shandong Province of China(Grant No.ZR202103010004)the Postdoctoral Science Foundation of China(Grant No.2021M691980).
文摘The reaction of water and other materials has been the central topic under high-pressure physics research,because the Earth,super-Earth,Uranus,Neptune and other planets contain a great amount of water inside.However,the reaction between star-rich MgO and water under ultra-high pressure remains still poorly understood.Here,using ab initio evolutionary structure prediction researches of the structures of MgO-H_(2)O system at 300 GPa-600 GPa,we find that(MgO)2H_(2)O and MgO(H_(2)O)2 could become stable.The(MgO)2H_(2)O compounds may be an important component of super-Earth and the ice-rock boundary of Uranus and Neptune.Furthermore,it may be the reservoir under high pressure before the forming of the Earth’s core or other super-Earths.The current findings could expand our knowledge and improve our understanding of the evolution and composition of planets.
基金the Nature Science Foundation of China(Nos.61671362 and 62071366).
文摘Atomization energy(AE)is an important indicator for measuring material stability and reactivity,which refers to the energy change when a polyatomic molecule decomposes into its constituent atoms.Predicting AE based on the structural information of molecules has been a focus of researchers,but existing methods have limitations such as being time-consuming or requiring complex preprocessing and large amounts of training data.Deep learning(DL),a new branch of machine learning(ML),has shown promise in learning internal rules and hierarchical representations of sample data,making it a potential solution for AE prediction.To address this problem,we propose a natural-parameter network(NPN)approach for AE prediction.This method establishes a clearer statistical interpretation of the relationship between the network’s output and the given data.We use the Coulomb matrix(CM)method to represent each compound as a structural information matrix.Furthermore,we also designed an end-to-end predictive model.Experimental results demonstrate that our method achieves excellent performance on the QM7 and BC2P datasets,and the mean absolute error(MAE)obtained on the QM7 test set ranges from 0.2 kcal/mol to 3 kcal/mol.The optimal result of our method is approximately an order of magnitude higher than the accuracy of 3 kcal/mol in published works.Additionally,our approach significantly accelerates the prediction time.Overall,this study presents a promising approach to accelerate the process of predicting structures using DL,and provides a valuable contribution to the field of chemical energy prediction.
文摘The discovery of novel materials with desired properties is essential to the advancements of energy-related technologies.Despite the rapid development of computational infrastructures and theoretical approaches,progress so far has been limited by the empirical and serial nature of experimental work.Fortunately,the situation is changing thanks to the maturation of theoretical tools such as density functional theory,high-throughput screening,crystal structure prediction,and emerging approaches based on machine learning.Together these recent innovations in computational chemistry,data informatics,and machine learning have acted as catalysts for revolutionizing material design and hopefully will lead to faster kinetics in the development of energy-related industries.In this report,recent advances in material discovery methods are reviewed for energy devices.Three paradigms based on empiricism-driven experiments,database-driven high-throughput screening,and data informatics-driven machine learning are discussed critically.Key methodological advancements involved are reviewed including high-throughput screening,crystal structure prediction,and generative models for target material design.Their applications in energy-related devices such as batteries,catalysts,and photovoltaics are selectively showcased.
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 12034009, 91961204, and 11974134)。
文摘As a fundamental thermodynamic variable, pressure can alter the bonding patterns and drive phase transitions leading to the creation of new high-pressure phases with exotic properties that are inaccessible at ambient pressure. Using the swarm intelligence structural prediction method, the phase transition of TiF_(3), from R-3c to the Pnma phase, was predicted at high pressure, accompanied by the destruction of TiF_6 octahedra and formation of TiF_8 square antiprismatic units. The Pnma phase of TiF_(3), formed using the laser-heated diamond-anvil-cell technique was confirmed via high-pressure x-ray diffraction experiments. Furthermore, the in situ electrical measurements indicate that the newly found Pnma phase has a semiconducting character, which is also consistent with the electronic band structure calculations. Finally, it was shown that this pressure-induced phase transition is a general phenomenon in ScF_(3), VF_(3), CrF_(3), and MnF_(3), offering valuable insights into the high-pressure phases of transition metal trifluorides.
基金supported by the Beijing Natural Science Foundation(2192029)the National Key Research and Development Program of China(2017YFB0702100)+6 种基金the National Natural Science Foundation of China(11404017,12004145)the Technology Foundation for Selected Overseas Chinese Scholarsthe Ministry of Human Resources and Social Security of Chinasupported by the Academic Excellence Foundation of BUAA for PhD Studentssupported by the Faraday Institution(grant number FIRG017)supported by the Singapore National Research Foundation(NRF-NRFF2017-04)supported by Jiangxi Provincial Natural Science Foundation(20212BAB214032)。
文摘SiO–based materials are promising alloys and conversion-type anode materials for lithium-ion batteries and are recently found to be excellent dendrite-proof layers for lithium-metal batteries.However,only a small fraction of the Li–Si–O compositional space has been reported,significantly impeding the understanding of the phase transition mechanisms and the rational design of these materials both as anodes and as protection layers for lithium-metal anodes.Herein,we identify three new thermodynamically stable phases within the Li–Si–O ternary system(Li_(2)SiO_(5),Li_(4)SiO_(6),and Li_(4)SiO_(8))in addition to the existing records via first-principle calculations.The electronic structure simulation shows that Li_(2)SiO_(5)and Li_(4)SiO_(8)phases are metallic in nature,ensuring high electronic conductivity required as electrodes.Moduli calculations demonstrate that the mechanical strength of Li–Si–O phases is much higher than that of lithium metal.The diffusion barriers of interstitial Li range from 0.1 to 0.6 eV and the interstitial Li hopping serves as the dominating diffusion mechanism in the Li–Si–O ternary systems compared with vacancy diffusion.These findings provide a new strategy for future discovery of improved alloying anodes for lithium-ion batteries and offer important insight towards the understanding of the phase transformation mechanism of alloy-type protection layers on lithium-metal anodes.
基金Supported by Project of National Natural Science Foundation(31360300&31560362)Key Project of the Tibet Autonomous Region(XZXTCX-2016)
文摘A carbonic anhydrase( CA) transcript was obtained from the Contig library according to the published sequencing information of the buckwheat transcripts. The full length of the CA gene was amplified by reverse transcription PCR( RT-PCR). The bioinformatics analysis showed that the full length of Fs CA1 gene was 1233 bp and open reading frame was 978 bp,and encoding 325 amino acids. The molecular weight was 35. 11 ku and the isoelectric point was 7. 59; there were 9 α helices,6 β folds,many randon coil and extension chain,containing one signal peptide and one transmembrane region,having a 2 amino acid conserved domains with typical beta-type carbonic anhydrase. Subcellular localization showed that the protein is most likely to appear in the chloroplast. The three-dimensional structure model of Fs CA1 was built by homologous modeling method,indicating that the homo-octamer of buckwheat CA and pea CA could match well,so it can be inferred that buckwheat CA is also homo-octamer. Real-time quantitative PCR was used to detect the expression of Fs CA1 in different organs of buckwheat.The results showed that Fs CA1 had the highest expression level in leaves,then in the stems,and the lowest in roots.