Molecular machines are key to cellular activity where they are involved in converting chemical and light energy into efficient mechanical work.During the last 60 years,designing molecular structures capable of generat...Molecular machines are key to cellular activity where they are involved in converting chemical and light energy into efficient mechanical work.During the last 60 years,designing molecular structures capable of generating unidirectional mechanical motion at the nanoscale has been the topic of intense research.Effective progress has been made,attributed to advances in various fields such as supramolecular chemistry,biology and nanotechnology,and informatics.However,individual molecular machines are only capable of producing nanometer work and generally have only a single functionality.In order to address these problems,collective behaviors realized by integrating several or more of these individual mechanical units in space and time have become a new paradigm.In this review,we comprehensively discuss recent developments in the collective behaviors of molecular machines.In particular,collective behavior is divided into two paradigms.One is the appropriate integration of molecular machines to efficiently amplify molecular motions and deformations to construct novel functional materials.The other is the construction of swarming modes at the supramolecular level to perform nanoscale or microscale operations.We discuss design strategies for both modes and focus on the modulation of features and properties.Subsequently,in order to address existing challenges,the idea of transferring experience gained in the field of micro/nano robotics is presented,offering prospects for future developments in the collective behavior of molecular machines.展开更多
The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with...The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.展开更多
As the simplest hydrogen-bonded alcohol,liquid methanol has attracted intensive experimental and theoretical interest.However,theoretical investigations on this system have primarily relied on empirical intermolecular...As the simplest hydrogen-bonded alcohol,liquid methanol has attracted intensive experimental and theoretical interest.However,theoretical investigations on this system have primarily relied on empirical intermolecular force fields or ab initio molecular dynamics with semilocal density functionals.Inspired by recent studies on bulk water using increasingly accurate machine learning force fields,we report a new machine learning force field for liquid methanol with a hybrid functional revPBE0 plus dispersion correction.Molecular dynamics simulations on this machine learning force field are orders of magnitude faster than ab initio molecular dynamics simulations,yielding the radial distribution functions,selfdiffusion coefficients,and hydrogen bond network properties with very small statistical errors.The resulting structural and dynamical properties are compared well with the experimental data,demonstrating the superior accuracy of this machine learning force field.This work represents a successful step toward a first-principles description of this benchmark system and showcases the general applicability of the machine learning force field in studying liquid systems.展开更多
The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accura...The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.展开更多
Non-ionic deep eutectic solvents(DESs)are non-ionic designer solvents with various applications in catalysis,extraction,carbon capture,and pharmaceuticals.However,discovering new DES candidates is challenging due to a...Non-ionic deep eutectic solvents(DESs)are non-ionic designer solvents with various applications in catalysis,extraction,carbon capture,and pharmaceuticals.However,discovering new DES candidates is challenging due to a lack of efficient tools that accurately predict DES formation.The search for DES relies heavily on intuition or trial-and-error processes,leading to low success rates or missed opportunities.Recognizing that hydrogen bonds(HBs)play a central role in DES formation,we aim to identify HB features that distinguish DES from non-DES systems and use them to develop machine learning(ML)models to discover new DES systems.We first analyze the HB properties of 38 known DES and 111 known non-DES systems using their molecular dynamics(MD)simulation trajectories.The analysis reveals that DES systems have two unique features compared to non-DES systems:The DESs have①more imbalance between the numbers of the two intra-component HBs and②more and stronger inter-component HBs.Based on these results,we develop 30 ML models using ten algorithms and three types of HB-based descriptors.The model performance is first benchmarked using the average and minimal receiver operating characteristic(ROC)-area under the curve(AUC)values.We also analyze the importance of individual features in the models,and the results are consistent with the simulation-based statistical analysis.Finally,we validate the models using the experimental data of 34 systems.The extra trees forest model outperforms the other models in the validation,with an ROC-AUC of 0.88.Our work illustrates the importance of HBs in DES formation and shows the potential of ML in discovering new DESs.展开更多
GeTe has attracted extensive research interest for thermoelectric applications.In this paper,we first train a neuroevolution potential(NEP)based on a dataset constructed by ab initio molecular dynamics,with the Gaussi...GeTe has attracted extensive research interest for thermoelectric applications.In this paper,we first train a neuroevolution potential(NEP)based on a dataset constructed by ab initio molecular dynamics,with the Gaussian approximation potential(GAP)as a reference.The phonon density of states is then calculated by two machine learning potentials and compared with density functional theory results,with the GAP potential having higher accuracy.Next,the thermal conductivity of a GeTe crystal at 300 K is calculated by the equilibrium molecular dynamics method using both machine learning potentials,and both of them are in good agreement with the experimental results;however,the calculation speed when using the NEP potential is about 500 times faster than when using the GAP potential.Finally,the lattice thermal conductivity in the range of 300 K-600 K is calculated using the NEP potential.The lattice thermal conductivity decreases as the temperature increases due to the phonon anharmonic effect.This study provides a theoretical tool for the study of the thermal conductivity of GeTe.展开更多
Thermodynamic properties of complex systems play an essential role in developing chemical engineering processes.It remains a challenge to predict the thermodynamic properties of complex systems in a wide range and des...Thermodynamic properties of complex systems play an essential role in developing chemical engineering processes.It remains a challenge to predict the thermodynamic properties of complex systems in a wide range and describe the behavior of ions and molecules in complex systems.Machine learning emerges as a powerful tool to resolve this issue because it can describe complex relationships beyond the capacity of traditional mathematical functions.This minireview will summarize some fundamental concepts of machine learning methods and their applications in three aspects of the molecular thermodynamics using several examples.The first aspect is to apply machine learning methods to predict the thermodynamic properties of a broad spectrum of systems based on known data.The second aspect is to integer machine learning and molecular simulations to accelerate the discovery of materials.The third aspect is to develop machine learning force field that can eliminate the barrier between quantum mechanics and all-atom molecular dynamics simulations.The applications in these three aspects illustrate the potential of machine learning in molecular thermodynamics of chemical engineering.We will also discuss the perspective of the broad applications of machine learning in chemical engineering.展开更多
Lung cancer is the most prevalent cancer diagnosis and the leading cause of cancer death worldwide.Therapeutic failure in lung cancer(LUAD)is heavily influenced by drug resistance.This challenge stems from the diverse...Lung cancer is the most prevalent cancer diagnosis and the leading cause of cancer death worldwide.Therapeutic failure in lung cancer(LUAD)is heavily influenced by drug resistance.This challenge stems from the diverse cell populations within the tumor,each having unique genetic,epigenetic,and phenotypic profiles.Such variations lead to varied therapeutic responses,thereby contributing to tumor relapse and disease progression.Methods:The Genomics of Drug Sensitivity in Cancer(GDSC)database was used in this investigation to obtain the mRNA expression dataset,genomic mutation profile,and drug sensitivity information of NSCLS.Machine Learning(ML)methods,including Random Forest(RF),Artificial Neurol Network(ANN),and Support Vector Machine(SVM),were used to predict the response status of each compound based on the mRNA and mutation characteristics determined using statistical methods.The most suitable method for each drug was proposed by comparing the prediction accuracy of different ML methods,and the selected mRNA and mutation characteristics were identified as molecular features for the drug-responsive cancer subtype.Finally,the prognostic influence of molecular features on the mutational subtype of LUAD in publicly available datasets.Results:Our analyses yielded 1,564 gene features and 45 mutational features for 46 drugs.Applying the ML approach to predict the drug response for each medication revealed an upstanding performance for SVM in predicting Afuresertib drug response(area under the curve[AUC]0.875)using CIT,GAS2L3,STAG3L3,ATP2B4-mut,and IL15RA-mut as molecular features.Furthermore,the ANN algorithm using 9 mRNA characteristics demonstrated the highest prediction performance(AUC 0.780)in Gefitinib with CCL23-mut.Conclusion:This work extensively investigated the mRNA and mutation signatures associated with drug response in LUAD using a machine-learning approach and proposed a priority algorithm to predict drug response for different drugs.展开更多
GaP has been shown to be a promising photoelectrocatalyst for selective CO_(2)reduction to methanol.Due to the relevance of the interface structure to important processes such as electron/proton transfer,a detailed un...GaP has been shown to be a promising photoelectrocatalyst for selective CO_(2)reduction to methanol.Due to the relevance of the interface structure to important processes such as electron/proton transfer,a detailed understanding of the GaP(110)-water interfacial structure is of great importance.Ab initio molecular dynamics(AIMD)can be used for obtaining the microscopic information of the interfacial structure.However,the GaP(110)-water interface cannot converge to an equilibrated structure at the time scale of the AIMD simulation.In this work,we perform the machine learning accelerated molecular dynamics(MLMD)to overcome the difficulty of insufficient sampling by AIMD.With the help of MLMD,we unravel the microscopic information of the structure of the GaP(110)-water interface,and obtain a deeper understanding of the mechanisms of proton transfer at the GaP(110)-water interface,which will pave the way for gaining valuable insights into photoelectrocatalytic mechanisms and improving the performance of photoelectrochemical cells.展开更多
Abstract Abstract:We have demonstrated using vectorized parallel Lennard-Jones fluid program that vectorizing general-purpose parallel molecular package for simulating biomolecules which currently runs on the Connect...Abstract Abstract:We have demonstrated using vectorized parallel Lennard-Jones fluid program that vectorizing general-purpose parallel molecular package for simulating biomolecules which currently runs on the Connection Machine CM-5 using CMMD message passing would offer a significant improvement over 4 non-vectorized version. Our results indicate that the Lennard-Jones fluid program written in C*/CMNID is five times faster than the same program written in C/CMMD.展开更多
Liposome is one of the most widely used carriers for drug delivery because of the great biocompatibility and biodegradability.Due to the complex formulation components and preparation process,formulation screening mos...Liposome is one of the most widely used carriers for drug delivery because of the great biocompatibility and biodegradability.Due to the complex formulation components and preparation process,formulation screening mostly relies on trial-and-error process with low efficiency.Here liposome formulation prediction models have been built by machine learning(ML)approaches.The important parameters of liposomes,including size,polydispersity index(PDI),zeta potential and encapsulation,are predicted individually by optimal ML algorithm,while the formulation features are also ranked to provide important guidance for formulation design.The analysis of key parameter reveals that drug molecules with logS[-3,-6],molecular complexity[500,1000]and XLogP3(≥2)are priority for preparing liposome with higher encapsulation.In addition,naproxen(NAP)and palmatine HCl(PAL)represented the insoluble and water-soluble molecules are prepared as liposome formulations to validate prediction ability.The consistency between predicted and experimental value verifies the satisfied accuracy of ML models.As the drug properties are critical for liposome particles,the molecular interactions and dynamics of NAP and PAL liposome are further investigated by coarse-grained molecular dynamics simulations.The modeling structure reveals that NAP molecules could distribute into lipid layer,while most PAL molecules aggregate in the inner aqueous phase of liposome.The completely different physical state of NAP and PAL confirms the importance of drug properties for liposome formulations.In summary,the general prediction models are built to predict liposome formulations,and the impacts of key factors are analyzed by combing ML with molecular modeling.The availability and rationality of these intelligent prediction systems have been proved in this study,which could be applied for liposome formulation development in the future.展开更多
The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compound...The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein.The use of virtual screening in pharmaceutical research is growing in popularity.During the early phases of medication research and development,it is crucial.Chemical compound searches are nowmore narrowly targeted.Because the databases containmore andmore ligands,thismethod needs to be quick and exact.Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint(ECFP).Only the largest sub-graph is taken into consideration to learn the representation,despite the fact that the conventional graph network generates a better-encoded fingerprint.When using the average or maximum pooling layer,it also contains unrelated data.This article suggested the Graph Convolutional Attention Network(GCAN),a graph neural network with an attention mechanism,to address these problems.Additionally,it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant.The generated fingerprint is used to classify drugs using ensemble learning.As base classifiers,ensemble stacking is applied to Support Vector Machines(SVM),Random Forest,Nave Bayes,Decision Trees,AdaBoost,and Gradient Boosting.When compared to existing models,the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy,sensitivity,specificity,and area under the curve.Additionally,it is revealed that our ensemble learning with generated molecular fingerprint yields 91%accuracy,outperforming earlier approaches.展开更多
Finding energetic materials with tailored properties is always a significant challenge due to low research efficiency in trial and error.Herein,a methodology combining domain knowledge,a machine learning algorithm,and...Finding energetic materials with tailored properties is always a significant challenge due to low research efficiency in trial and error.Herein,a methodology combining domain knowledge,a machine learning algorithm,and experiments is presented for accelerating the discovery of novel energetic materials.A high-throughput virtual screening(HTVS)system integrating on-demand molecular generation and machine learning models covering the prediction of molecular properties and crystal packing mode scoring is established.With the proposed HTVS system,candidate molecules with promising properties and a desirable crystal packing mode are rapidly targeted from the generated molecular space containing 25112 molecules.Furthermore,a study of the crystal structure and properties shows that the good comprehensive performances of the target molecule are in agreement with the predicted results,thus verifying the effectiveness of the proposed methodology.This work demonstrates a new research paradigm for discovering novel energetic materials and can be extended to other organic materials without manifest obstacles.展开更多
Defects in graphene can profoundly impact its extraordinary properties,ultimately influencing the performances of graphene-based nanodevices.Methods to detect defects with atomic resolution in graphene can be technica...Defects in graphene can profoundly impact its extraordinary properties,ultimately influencing the performances of graphene-based nanodevices.Methods to detect defects with atomic resolution in graphene can be technically demanding and involve complex sample preparations.An alternative approach is to observe the thermal vibration properties of the graphene sheet,which reflects defect information but in an implicit fashion.Machine learning,an emerging data-driven approach that offers solutions to learning hidden patterns from complex data,has been extensively applied in material design and discovery problems.In this paper,we propose a machine learning-based approach to detect graphene defects by discovering the hidden correlation between defect locations and thermal vibration features.Two prediction strategies are developed:an atom-based method which constructs data by atom indices,and a domain-based method which constructs data by domain discretization.Results show that while the atom-based method is capable of detecting a single-atom vacancy,the domain-based method can detect an unknown number of multiple vacancies up to atomic precision.Both methods can achieve approximately a 90%prediction accuracy on the reserved data for testing,indicating a promising extrapolation into unseen future graphene configurations.The proposed strategy offers promising solutions for the non-destructive evaluation of nanomaterials and accelerates new material discoveries.展开更多
Zirconia has been extensively used in aerospace,military,biomedical and industrial fields due to its unusual combination of high mechanical,electrical and thermal properties.However,the fundamental and critical phase ...Zirconia has been extensively used in aerospace,military,biomedical and industrial fields due to its unusual combination of high mechanical,electrical and thermal properties.However,the fundamental and critical phase transition process of zirconia has not been well studied because of its difficult first-order phase transition with formidable energy barrier.Here,we generated a machine learning interatomic potential with ab initio accuracy to discover the mechanism behind all kinds of phase transition of zirconia at ambient pressure.The machine learning potential precisely characterized atomic interactions among all zirconia allotropes and liquid zirconia in a wide temperature range.We realized the challenging reversible first-order monoclinic-tetragonal and cubicliquid phase transition processes with enhanced sampling techniques.From the thermodynamic information,we gave a better understanding of the thermal hysteresis phenomenon in martensitic monoclinic-tetragonal transition.The phase diagram of zirconia from our machine learning potential based molecular dynamics simulations corresponded well with experimental results.展开更多
In this study,10 novel anti-inflammatory peptides were identified from duck liver,and their molecular mechanism was demonstrated based on machine learning and molecular docking.Using Sephadex G-15 gel chromatography s...In this study,10 novel anti-inflammatory peptides were identified from duck liver,and their molecular mechanism was demonstrated based on machine learning and molecular docking.Using Sephadex G-15 gel chromatography separation,reversed-phase high-performance liquid chromatography purification,liquid chromatography-tandem mass spectrometry identification,and BIOPEP database comparison,10 novel antiinflammatory peptides were initially found.Their splendid angiotensin-converting enzyme(ACE)inhibition and anti-inflammatory properties were confirmed by machine learning.With binding energies less than–20.93 kJ/mol,molecular docking revealed that they could efficiently bind to the active pockets of tumor necrosis factorα(TNF-α),interleukin 6(IL-6),cyclooxygenase 2(COX-2),and nuclear factorκB(NF-κB)proteins with efficiency,indicating that the compounds can spontaneously form complexes through hydrogen bonding and hydrophobic interactions with the protein binding pockets.In the lipopolysaccharide-induced RAW264.7 cell model,the release of NO,TNF-α,and IL-6 and the mRNA expression of inflammatory factors(TNF-α,IL-6,COX-2,and NF-κB)were significantly inhibited by these peptides.We concluded it might be due to their anti-inflammatory effects by inhibiting the protein phosphorylation of inhibitor of NF-κB(IκBα)in the cytoplasm and preventing the translocation of NF-κB p65 in the cytoplasm to the nucleus,thereby regulating the NF-κB signaling pathway.This study is essential for the screening of anti-inflammatory peptides and the investigation of the mechanism of action.展开更多
Shear deformation mechanisms of diamond-like carbon(DLC)are commonly unclear since its thickness of several micrometers limits the detailed analysis of its microstructural evolution and mechanical performance,which fu...Shear deformation mechanisms of diamond-like carbon(DLC)are commonly unclear since its thickness of several micrometers limits the detailed analysis of its microstructural evolution and mechanical performance,which further influences the improvement of the friction and wear performance of DLC.This study aims to investigate this issue utilizing molecular dynamics simulation and machine learning(ML)techniques.It is indicated that the changes in the mechanical properties of DLC are mainly due to the expansion and reduction of sp3 networks,causing the stick-slip patterns in shear force.In addition,cluster analysis showed that the sp2-sp3 transitions arise in the stick stage,while the sp3-sp2 transitions occur in the slip stage.In order to analyze the mechanisms governing the bond breaking/re-formation in these transitions,the Random Forest(RF)model in ML identifies that the kinetic energies of sp3 atoms and their velocities along the loading direction have the highest influence.This is because high kinetic energies of atoms can exacerbate the instability of the bonding state and increase the probability of bond breaking/re-formation.Finally,the RF model finds that the shear force of DLC is highly correlated to its potential energy,with less correlation to its content of sp3 atoms.Since the changes in potential energy are caused by the variances in the content of sp3 atoms and localized strains,potential energy is an ideal parameter to evaluate the shear deformation of DLC.The results can enhance the understanding of the shear deformation of DLC and support the improvement of its frictional and wear performance.展开更多
Natural molecular machines have inspired the development of artificial molecular machines,which have the potential to revolutionize several areas of technology.Artificial molecular machines commonly employ molecular s...Natural molecular machines have inspired the development of artificial molecular machines,which have the potential to revolutionize several areas of technology.Artificial molecular machines commonly employ molecular switches,molecular motors,and molecular shuttles as fundamental building blocks.The observation of artificial molecular machines constructed by these building blocks can be highly challenging due to their small sizes and intricate behaviors.The use of modern instrumentation and advanced observational techniques plays a crucial role in the observation and characterization of molecular machines.Furthermore,a well-designed molecular structure is also a critical factor in making molecular ma-chines more observable.This review summarizes the common methods from diverse perspectives used to observe molecular machines and emphasizes the significance of comprehending their behaviors in the design of superior artificial molecular machines.展开更多
Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying...Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying on machine-learning approaches may lead to invalid conformations.In this study,we propose a novel strategy that combines molecular docking and machine learning methods.Firstly,the protein-ligand binding poses are predicted using a deep learning model.Subsequently,position-restricted docking on predicted binding poses is performed using Uni-Dock,generating physically constrained and valid binding poses.Finally,the binding poses are re-scored and ranked using machine learning scoring functions.This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking.Evaluation experiments on multiple datasets demonstrate that,compared to using molecular docking or machine learning methods alone,our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions.展开更多
基金supported by National Key R&D Program of China(2018YFA0901700)National Natural Science Foundation of China(22278241)+1 种基金a grant from the Institute Guo Qiang,Tsinghua University(2021GQG1016)Department of Chemical Engineering-iBHE Joint Cooperation Fund.
文摘Molecular machines are key to cellular activity where they are involved in converting chemical and light energy into efficient mechanical work.During the last 60 years,designing molecular structures capable of generating unidirectional mechanical motion at the nanoscale has been the topic of intense research.Effective progress has been made,attributed to advances in various fields such as supramolecular chemistry,biology and nanotechnology,and informatics.However,individual molecular machines are only capable of producing nanometer work and generally have only a single functionality.In order to address these problems,collective behaviors realized by integrating several or more of these individual mechanical units in space and time have become a new paradigm.In this review,we comprehensively discuss recent developments in the collective behaviors of molecular machines.In particular,collective behavior is divided into two paradigms.One is the appropriate integration of molecular machines to efficiently amplify molecular motions and deformations to construct novel functional materials.The other is the construction of swarming modes at the supramolecular level to perform nanoscale or microscale operations.We discuss design strategies for both modes and focus on the modulation of features and properties.Subsequently,in order to address existing challenges,the idea of transferring experience gained in the field of micro/nano robotics is presented,offering prospects for future developments in the collective behavior of molecular machines.
基金Project supported by the National Key Research and Development Program of China(Grant No.2023YFF1204402)the National Natural Science Foundation of China(Grant Nos.12074079 and 12374208)+1 种基金the Natural Science Foundation of Shanghai(Grant No.22ZR1406800)the China Postdoctoral Science Foundation(Grant No.2022M720815).
文摘The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.
基金supported by the CAS Project for Young Scientists in Basic Research(YSBR-005)the National Natural Science Foundation of China(22325304,22221003 and 22033007)We acknowledge the Supercomputing Center of USTC,Hefei Advanced Computing Center,Beijing PARATERA Tech Co.,Ltd.,for providing high-performance computing services。
文摘As the simplest hydrogen-bonded alcohol,liquid methanol has attracted intensive experimental and theoretical interest.However,theoretical investigations on this system have primarily relied on empirical intermolecular force fields or ab initio molecular dynamics with semilocal density functionals.Inspired by recent studies on bulk water using increasingly accurate machine learning force fields,we report a new machine learning force field for liquid methanol with a hybrid functional revPBE0 plus dispersion correction.Molecular dynamics simulations on this machine learning force field are orders of magnitude faster than ab initio molecular dynamics simulations,yielding the radial distribution functions,selfdiffusion coefficients,and hydrogen bond network properties with very small statistical errors.The resulting structural and dynamical properties are compared well with the experimental data,demonstrating the superior accuracy of this machine learning force field.This work represents a successful step toward a first-principles description of this benchmark system and showcases the general applicability of the machine learning force field in studying liquid systems.
文摘The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.
基金supported by Ignite Research Collaborations(IRC),Startup funds,and the UK Artificial Intelligence(AI)in Medicine Research Alliance Pilot(NCATS UL1TR001998 and NCI P30 CA177558)。
文摘Non-ionic deep eutectic solvents(DESs)are non-ionic designer solvents with various applications in catalysis,extraction,carbon capture,and pharmaceuticals.However,discovering new DES candidates is challenging due to a lack of efficient tools that accurately predict DES formation.The search for DES relies heavily on intuition or trial-and-error processes,leading to low success rates or missed opportunities.Recognizing that hydrogen bonds(HBs)play a central role in DES formation,we aim to identify HB features that distinguish DES from non-DES systems and use them to develop machine learning(ML)models to discover new DES systems.We first analyze the HB properties of 38 known DES and 111 known non-DES systems using their molecular dynamics(MD)simulation trajectories.The analysis reveals that DES systems have two unique features compared to non-DES systems:The DESs have①more imbalance between the numbers of the two intra-component HBs and②more and stronger inter-component HBs.Based on these results,we develop 30 ML models using ten algorithms and three types of HB-based descriptors.The model performance is first benchmarked using the average and minimal receiver operating characteristic(ROC)-area under the curve(AUC)values.We also analyze the importance of individual features in the models,and the results are consistent with the simulation-based statistical analysis.Finally,we validate the models using the experimental data of 34 systems.The extra trees forest model outperforms the other models in the validation,with an ROC-AUC of 0.88.Our work illustrates the importance of HBs in DES formation and shows the potential of ML in discovering new DESs.
基金Project supported by the A*STAR Computational Resource Centre through the use of its high-performance computing facilitiesfinancial support from the China Scholarship Council (Grant No.202206120136)。
文摘GeTe has attracted extensive research interest for thermoelectric applications.In this paper,we first train a neuroevolution potential(NEP)based on a dataset constructed by ab initio molecular dynamics,with the Gaussian approximation potential(GAP)as a reference.The phonon density of states is then calculated by two machine learning potentials and compared with density functional theory results,with the GAP potential having higher accuracy.Next,the thermal conductivity of a GeTe crystal at 300 K is calculated by the equilibrium molecular dynamics method using both machine learning potentials,and both of them are in good agreement with the experimental results;however,the calculation speed when using the NEP potential is about 500 times faster than when using the GAP potential.Finally,the lattice thermal conductivity in the range of 300 K-600 K is calculated using the NEP potential.The lattice thermal conductivity decreases as the temperature increases due to the phonon anharmonic effect.This study provides a theoretical tool for the study of the thermal conductivity of GeTe.
基金financial supports from the National Natural Science Foundation of China(21676245 and 51933009)the National Key Research and Development Program of China(2017YFB0702502)+1 种基金the Leading Innovative and Entrepreneur Team Introduction Program of Zhejiang(2019R01006)financial support provided by the Startup Funds of the University of Kentucky。
文摘Thermodynamic properties of complex systems play an essential role in developing chemical engineering processes.It remains a challenge to predict the thermodynamic properties of complex systems in a wide range and describe the behavior of ions and molecules in complex systems.Machine learning emerges as a powerful tool to resolve this issue because it can describe complex relationships beyond the capacity of traditional mathematical functions.This minireview will summarize some fundamental concepts of machine learning methods and their applications in three aspects of the molecular thermodynamics using several examples.The first aspect is to apply machine learning methods to predict the thermodynamic properties of a broad spectrum of systems based on known data.The second aspect is to integer machine learning and molecular simulations to accelerate the discovery of materials.The third aspect is to develop machine learning force field that can eliminate the barrier between quantum mechanics and all-atom molecular dynamics simulations.The applications in these three aspects illustrate the potential of machine learning in molecular thermodynamics of chemical engineering.We will also discuss the perspective of the broad applications of machine learning in chemical engineering.
文摘Lung cancer is the most prevalent cancer diagnosis and the leading cause of cancer death worldwide.Therapeutic failure in lung cancer(LUAD)is heavily influenced by drug resistance.This challenge stems from the diverse cell populations within the tumor,each having unique genetic,epigenetic,and phenotypic profiles.Such variations lead to varied therapeutic responses,thereby contributing to tumor relapse and disease progression.Methods:The Genomics of Drug Sensitivity in Cancer(GDSC)database was used in this investigation to obtain the mRNA expression dataset,genomic mutation profile,and drug sensitivity information of NSCLS.Machine Learning(ML)methods,including Random Forest(RF),Artificial Neurol Network(ANN),and Support Vector Machine(SVM),were used to predict the response status of each compound based on the mRNA and mutation characteristics determined using statistical methods.The most suitable method for each drug was proposed by comparing the prediction accuracy of different ML methods,and the selected mRNA and mutation characteristics were identified as molecular features for the drug-responsive cancer subtype.Finally,the prognostic influence of molecular features on the mutational subtype of LUAD in publicly available datasets.Results:Our analyses yielded 1,564 gene features and 45 mutational features for 46 drugs.Applying the ML approach to predict the drug response for each medication revealed an upstanding performance for SVM in predicting Afuresertib drug response(area under the curve[AUC]0.875)using CIT,GAS2L3,STAG3L3,ATP2B4-mut,and IL15RA-mut as molecular features.Furthermore,the ANN algorithm using 9 mRNA characteristics demonstrated the highest prediction performance(AUC 0.780)in Gefitinib with CCL23-mut.Conclusion:This work extensively investigated the mRNA and mutation signatures associated with drug response in LUAD using a machine-learning approach and proposed a priority algorithm to predict drug response for different drugs.
基金the National Natural Science Foundation of China(22225302,21991151,21991150,22021001,92161113,91945301)the Fundamental Research Funds for the Central Universities(20720220009)+1 种基金the China Postdoctoral Science Foundation(2020 M682079)the Guangdong Basic and Applied Basic Research Foundation(2020A1515110539)。
文摘GaP has been shown to be a promising photoelectrocatalyst for selective CO_(2)reduction to methanol.Due to the relevance of the interface structure to important processes such as electron/proton transfer,a detailed understanding of the GaP(110)-water interfacial structure is of great importance.Ab initio molecular dynamics(AIMD)can be used for obtaining the microscopic information of the interfacial structure.However,the GaP(110)-water interface cannot converge to an equilibrated structure at the time scale of the AIMD simulation.In this work,we perform the machine learning accelerated molecular dynamics(MLMD)to overcome the difficulty of insufficient sampling by AIMD.With the help of MLMD,we unravel the microscopic information of the structure of the GaP(110)-water interface,and obtain a deeper understanding of the mechanisms of proton transfer at the GaP(110)-water interface,which will pave the way for gaining valuable insights into photoelectrocatalytic mechanisms and improving the performance of photoelectrochemical cells.
文摘Abstract Abstract:We have demonstrated using vectorized parallel Lennard-Jones fluid program that vectorizing general-purpose parallel molecular package for simulating biomolecules which currently runs on the Connection Machine CM-5 using CMMD message passing would offer a significant improvement over 4 non-vectorized version. Our results indicate that the Lennard-Jones fluid program written in C*/CMNID is five times faster than the same program written in C/CMMD.
基金supported by the Multi-Year Research Grants from the University of Macao(MYRG2019-00032-ICMS and MYRG2020-00113-ICMS)the Macao FDCT research grant(0108/2021/A)Molecular modeling was performed at the High-Performance Computing Cluster(HPCC),which is supported by the Information and Communication Technology Office(ICTO)of the University of Macao.
文摘Liposome is one of the most widely used carriers for drug delivery because of the great biocompatibility and biodegradability.Due to the complex formulation components and preparation process,formulation screening mostly relies on trial-and-error process with low efficiency.Here liposome formulation prediction models have been built by machine learning(ML)approaches.The important parameters of liposomes,including size,polydispersity index(PDI),zeta potential and encapsulation,are predicted individually by optimal ML algorithm,while the formulation features are also ranked to provide important guidance for formulation design.The analysis of key parameter reveals that drug molecules with logS[-3,-6],molecular complexity[500,1000]and XLogP3(≥2)are priority for preparing liposome with higher encapsulation.In addition,naproxen(NAP)and palmatine HCl(PAL)represented the insoluble and water-soluble molecules are prepared as liposome formulations to validate prediction ability.The consistency between predicted and experimental value verifies the satisfied accuracy of ML models.As the drug properties are critical for liposome particles,the molecular interactions and dynamics of NAP and PAL liposome are further investigated by coarse-grained molecular dynamics simulations.The modeling structure reveals that NAP molecules could distribute into lipid layer,while most PAL molecules aggregate in the inner aqueous phase of liposome.The completely different physical state of NAP and PAL confirms the importance of drug properties for liposome formulations.In summary,the general prediction models are built to predict liposome formulations,and the impacts of key factors are analyzed by combing ML with molecular modeling.The availability and rationality of these intelligent prediction systems have been proved in this study,which could be applied for liposome formulation development in the future.
文摘The drug development process takes a long time since it requires sorting through a large number of inactive compounds from a large collection of compounds chosen for study and choosing just the most pertinent compounds that can bind to a disease protein.The use of virtual screening in pharmaceutical research is growing in popularity.During the early phases of medication research and development,it is crucial.Chemical compound searches are nowmore narrowly targeted.Because the databases containmore andmore ligands,thismethod needs to be quick and exact.Neural network fingerprints were created more effectively than the well-known Extended Connectivity Fingerprint(ECFP).Only the largest sub-graph is taken into consideration to learn the representation,despite the fact that the conventional graph network generates a better-encoded fingerprint.When using the average or maximum pooling layer,it also contains unrelated data.This article suggested the Graph Convolutional Attention Network(GCAN),a graph neural network with an attention mechanism,to address these problems.Additionally,it makes the nodes or sub-graphs that are used to create the molecular fingerprint more significant.The generated fingerprint is used to classify drugs using ensemble learning.As base classifiers,ensemble stacking is applied to Support Vector Machines(SVM),Random Forest,Nave Bayes,Decision Trees,AdaBoost,and Gradient Boosting.When compared to existing models,the proposed GCAN fingerprint with an ensemble model achieves relatively high accuracy,sensitivity,specificity,and area under the curve.Additionally,it is revealed that our ensemble learning with generated molecular fingerprint yields 91%accuracy,outperforming earlier approaches.
基金the Science Challenge Project(TZ2018004)the National Natural Science Foundation of China(21875228 and 21702195)for financial support。
文摘Finding energetic materials with tailored properties is always a significant challenge due to low research efficiency in trial and error.Herein,a methodology combining domain knowledge,a machine learning algorithm,and experiments is presented for accelerating the discovery of novel energetic materials.A high-throughput virtual screening(HTVS)system integrating on-demand molecular generation and machine learning models covering the prediction of molecular properties and crystal packing mode scoring is established.With the proposed HTVS system,candidate molecules with promising properties and a desirable crystal packing mode are rapidly targeted from the generated molecular space containing 25112 molecules.Furthermore,a study of the crystal structure and properties shows that the good comprehensive performances of the target molecule are in agreement with the predicted results,thus verifying the effectiveness of the proposed methodology.This work demonstrates a new research paradigm for discovering novel energetic materials and can be extended to other organic materials without manifest obstacles.
基金This work used the Extreme Science and Engineering Discovery Environment(XSEDE)Bridges system,which is supported by National Science Foundation Grant Number ACI-1548562.
文摘Defects in graphene can profoundly impact its extraordinary properties,ultimately influencing the performances of graphene-based nanodevices.Methods to detect defects with atomic resolution in graphene can be technically demanding and involve complex sample preparations.An alternative approach is to observe the thermal vibration properties of the graphene sheet,which reflects defect information but in an implicit fashion.Machine learning,an emerging data-driven approach that offers solutions to learning hidden patterns from complex data,has been extensively applied in material design and discovery problems.In this paper,we propose a machine learning-based approach to detect graphene defects by discovering the hidden correlation between defect locations and thermal vibration features.Two prediction strategies are developed:an atom-based method which constructs data by atom indices,and a domain-based method which constructs data by domain discretization.Results show that while the atom-based method is capable of detecting a single-atom vacancy,the domain-based method can detect an unknown number of multiple vacancies up to atomic precision.Both methods can achieve approximately a 90%prediction accuracy on the reserved data for testing,indicating a promising extrapolation into unseen future graphene configurations.The proposed strategy offers promising solutions for the non-destructive evaluation of nanomaterials and accelerates new material discoveries.
基金the Creative Research Groups of National Natural Science Foundation of China(Grant No.51921006)National Natural Science Foundation of China(Grant No.52322803)。
文摘Zirconia has been extensively used in aerospace,military,biomedical and industrial fields due to its unusual combination of high mechanical,electrical and thermal properties.However,the fundamental and critical phase transition process of zirconia has not been well studied because of its difficult first-order phase transition with formidable energy barrier.Here,we generated a machine learning interatomic potential with ab initio accuracy to discover the mechanism behind all kinds of phase transition of zirconia at ambient pressure.The machine learning potential precisely characterized atomic interactions among all zirconia allotropes and liquid zirconia in a wide temperature range.We realized the challenging reversible first-order monoclinic-tetragonal and cubicliquid phase transition processes with enhanced sampling techniques.From the thermodynamic information,we gave a better understanding of the thermal hysteresis phenomenon in martensitic monoclinic-tetragonal transition.The phase diagram of zirconia from our machine learning potential based molecular dynamics simulations corresponded well with experimental results.
基金supported by the National Key R&D Program of China(2021YFD2100104)Science and Technology Programs of Zhejiang(2019C02085)the Modern Agricultural Technical Foundation of China(CARS-42-25).
文摘In this study,10 novel anti-inflammatory peptides were identified from duck liver,and their molecular mechanism was demonstrated based on machine learning and molecular docking.Using Sephadex G-15 gel chromatography separation,reversed-phase high-performance liquid chromatography purification,liquid chromatography-tandem mass spectrometry identification,and BIOPEP database comparison,10 novel antiinflammatory peptides were initially found.Their splendid angiotensin-converting enzyme(ACE)inhibition and anti-inflammatory properties were confirmed by machine learning.With binding energies less than–20.93 kJ/mol,molecular docking revealed that they could efficiently bind to the active pockets of tumor necrosis factorα(TNF-α),interleukin 6(IL-6),cyclooxygenase 2(COX-2),and nuclear factorκB(NF-κB)proteins with efficiency,indicating that the compounds can spontaneously form complexes through hydrogen bonding and hydrophobic interactions with the protein binding pockets.In the lipopolysaccharide-induced RAW264.7 cell model,the release of NO,TNF-α,and IL-6 and the mRNA expression of inflammatory factors(TNF-α,IL-6,COX-2,and NF-κB)were significantly inhibited by these peptides.We concluded it might be due to their anti-inflammatory effects by inhibiting the protein phosphorylation of inhibitor of NF-κB(IκBα)in the cytoplasm and preventing the translocation of NF-κB p65 in the cytoplasm to the nucleus,thereby regulating the NF-κB signaling pathway.This study is essential for the screening of anti-inflammatory peptides and the investigation of the mechanism of action.
基金The simulations in this work are supported by the High-Performance Computing Center of Central South University.
文摘Shear deformation mechanisms of diamond-like carbon(DLC)are commonly unclear since its thickness of several micrometers limits the detailed analysis of its microstructural evolution and mechanical performance,which further influences the improvement of the friction and wear performance of DLC.This study aims to investigate this issue utilizing molecular dynamics simulation and machine learning(ML)techniques.It is indicated that the changes in the mechanical properties of DLC are mainly due to the expansion and reduction of sp3 networks,causing the stick-slip patterns in shear force.In addition,cluster analysis showed that the sp2-sp3 transitions arise in the stick stage,while the sp3-sp2 transitions occur in the slip stage.In order to analyze the mechanisms governing the bond breaking/re-formation in these transitions,the Random Forest(RF)model in ML identifies that the kinetic energies of sp3 atoms and their velocities along the loading direction have the highest influence.This is because high kinetic energies of atoms can exacerbate the instability of the bonding state and increase the probability of bond breaking/re-formation.Finally,the RF model finds that the shear force of DLC is highly correlated to its potential energy,with less correlation to its content of sp3 atoms.Since the changes in potential energy are caused by the variances in the content of sp3 atoms and localized strains,potential energy is an ideal parameter to evaluate the shear deformation of DLC.The results can enhance the understanding of the shear deformation of DLC and support the improvement of its frictional and wear performance.
基金supported by“Zhishan”Scholars Programs of Southeast University,Jiangsu Innovation Team Program,and the Fundamental Research Funds for the Central Universities.
文摘Natural molecular machines have inspired the development of artificial molecular machines,which have the potential to revolutionize several areas of technology.Artificial molecular machines commonly employ molecular switches,molecular motors,and molecular shuttles as fundamental building blocks.The observation of artificial molecular machines constructed by these building blocks can be highly challenging due to their small sizes and intricate behaviors.The use of modern instrumentation and advanced observational techniques plays a crucial role in the observation and characterization of molecular machines.Furthermore,a well-designed molecular structure is also a critical factor in making molecular ma-chines more observable.This review summarizes the common methods from diverse perspectives used to observe molecular machines and emphasizes the significance of comprehending their behaviors in the design of superior artificial molecular machines.
基金supported by the National Key Research and Development Program of China(2022YFA1004302)
文摘Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying on machine-learning approaches may lead to invalid conformations.In this study,we propose a novel strategy that combines molecular docking and machine learning methods.Firstly,the protein-ligand binding poses are predicted using a deep learning model.Subsequently,position-restricted docking on predicted binding poses is performed using Uni-Dock,generating physically constrained and valid binding poses.Finally,the binding poses are re-scored and ranked using machine learning scoring functions.This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking.Evaluation experiments on multiple datasets demonstrate that,compared to using molecular docking or machine learning methods alone,our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions.