Kinetic energy(KE) functional is crucial to speed up density functional theory calculation. However, deriving it accurately through traditional physics reasoning is challenging. We develop a generally applicable KE fu...Kinetic energy(KE) functional is crucial to speed up density functional theory calculation. However, deriving it accurately through traditional physics reasoning is challenging. We develop a generally applicable KE functional estimator for a one-dimensional (1D) extended system using a machine learning method. Our end-to-end solution combines the dimensionality reduction method with the Gaussian process regression, and simple scaling method to adapt to various 1D lattices. In addition to reaching chemical accuracy in KE calculation, our estimator also performs well on KE functional derivative prediction. Integrating this machine learning KE functional into the current orbital free density functional theory scheme is able to provide us with expected ground state electron density.展开更多
Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship ...Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.展开更多
Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying...Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying on machine-learning approaches may lead to invalid conformations.In this study,we propose a novel strategy that combines molecular docking and machine learning methods.Firstly,the protein-ligand binding poses are predicted using a deep learning model.Subsequently,position-restricted docking on predicted binding poses is performed using Uni-Dock,generating physically constrained and valid binding poses.Finally,the binding poses are re-scored and ranked using machine learning scoring functions.This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking.Evaluation experiments on multiple datasets demonstrate that,compared to using molecular docking or machine learning methods alone,our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions.展开更多
In this paper, new solutions for the problem of pose estimation from correspondences between 3D model lines and 2D image lines are proposed. Traditional line-based pose estimation methods rely on the assumption that t...In this paper, new solutions for the problem of pose estimation from correspondences between 3D model lines and 2D image lines are proposed. Traditional line-based pose estimation methods rely on the assumption that the noises(perpendicular to the line) for the two endpoints are statistically independent. However, these two noises are in fact negatively correlated when the image line segment is fitted using the least-squares technique. Therefore, we design a new error function expressed by the average integral of the distance between line segments. Three least-squares techniques that optimize both the rotation and translation simultaneously are proposed in which the new error function is exploited. In addition, Lie group formalism is utilized to describe the pose parameters, and then, the optimization problem can be solved by means of a simple iterative least squares method. To enhance the robustness to outliers existing in the match data, an M-estimation method is developed to convert the pose optimization problem into an iterative reweighted least squares problem. The proposed methods are validated through experiments using both synthetic and real-world data. The experimental results show that the proposed methods yield a clearly higher precision than the traditional methods.展开更多
基金Supported by the Hong Kong Research Grants Council (Project No.GRF16300918)the National Key R&D Program of China(Grant Nos.2016YFA0300603 and 2016YFA0302400)the National Natural Science Foundation of China (Grant No.11774398)。
文摘Kinetic energy(KE) functional is crucial to speed up density functional theory calculation. However, deriving it accurately through traditional physics reasoning is challenging. We develop a generally applicable KE functional estimator for a one-dimensional (1D) extended system using a machine learning method. Our end-to-end solution combines the dimensionality reduction method with the Gaussian process regression, and simple scaling method to adapt to various 1D lattices. In addition to reaching chemical accuracy in KE calculation, our estimator also performs well on KE functional derivative prediction. Integrating this machine learning KE functional into the current orbital free density functional theory scheme is able to provide us with expected ground state electron density.
基金funded by the National Science Foundation of China(62006068)Hebei Natural Science Foundation(A2021402008),Natural Science Foundation of Scientific Research Project of Higher Education in Hebei Province(ZD2020185,QN2020188)333 Talent Supported Project of Hebei Province(C20221026).
文摘Imbalanced datasets are common in practical applications,and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes.However,the creation of fuzzy rules typically depends on expert knowledge,which may not fully leverage the label information in training data and may be subjective.To address this issue,a novel fuzzy rule oversampling approach is developed based on the learning vector quantization(LVQ)algorithm.In this method,the label information of the training data is utilized to determine the antecedent part of If-Then fuzzy rules by dynamically dividing attribute intervals using LVQ.Subsequently,fuzzy rules are generated and adjusted to calculate rule weights.The number of new samples to be synthesized for each rule is then computed,and samples from the minority class are synthesized based on the newly generated fuzzy rules.This results in the establishment of a fuzzy rule oversampling method based on LVQ.To evaluate the effectiveness of this method,comparative experiments are conducted on 12 publicly available imbalance datasets with five other sampling techniques in combination with the support function machine.The experimental results demonstrate that the proposed method can significantly enhance the classification algorithm across seven performance indicators,including a boost of 2.15%to 12.34%in Accuracy,6.11%to 27.06%in G-mean,and 4.69%to 18.78%in AUC.These show that the proposed method is capable of more efficiently improving the classification performance of imbalanced data.
基金supported by the National Key Research and Development Program of China(2022YFA1004302)
文摘Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying on machine-learning approaches may lead to invalid conformations.In this study,we propose a novel strategy that combines molecular docking and machine learning methods.Firstly,the protein-ligand binding poses are predicted using a deep learning model.Subsequently,position-restricted docking on predicted binding poses is performed using Uni-Dock,generating physically constrained and valid binding poses.Finally,the binding poses are re-scored and ranked using machine learning scoring functions.This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking.Evaluation experiments on multiple datasets demonstrate that,compared to using molecular docking or machine learning methods alone,our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions.
基金supported by the National Basic Research Program of China(“973”Project)(Grant No.2013CB733100)National Natural Science Foundation of China(Grant No.11332012)
文摘In this paper, new solutions for the problem of pose estimation from correspondences between 3D model lines and 2D image lines are proposed. Traditional line-based pose estimation methods rely on the assumption that the noises(perpendicular to the line) for the two endpoints are statistically independent. However, these two noises are in fact negatively correlated when the image line segment is fitted using the least-squares technique. Therefore, we design a new error function expressed by the average integral of the distance between line segments. Three least-squares techniques that optimize both the rotation and translation simultaneously are proposed in which the new error function is exploited. In addition, Lie group formalism is utilized to describe the pose parameters, and then, the optimization problem can be solved by means of a simple iterative least squares method. To enhance the robustness to outliers existing in the match data, an M-estimation method is developed to convert the pose optimization problem into an iterative reweighted least squares problem. The proposed methods are validated through experiments using both synthetic and real-world data. The experimental results show that the proposed methods yield a clearly higher precision than the traditional methods.