Molecular reconstruction is a rapid and reliable way to provide molecular detail of petroleum fractions,which is required in the kinetic modeling of petroleum conversation processes at the molecular level.In the typic...Molecular reconstruction is a rapid and reliable way to provide molecular detail of petroleum fractions,which is required in the kinetic modeling of petroleum conversation processes at the molecular level.In the typical stochastic reconstruction method,the estimation of properties of pseudo molecules that are generated by Monte Carlo sampling depends on the building of predefined molecular libraries,which is expensive and inaccessible for certain petroleum fractions.In this paper,a novel stochastic reconstruction strategy is proposed,which is based on a stratified library of structural descriptors.Properties of pseudo molecules generated in the novel strategy can be directly estimated by group contribution method in the condition of lacking predefined molecular libraries.In this strategy,the molecular building diagram comprises two steps.First,the ring structure is configured by determining the number of rings.Different from the length of chain adopted in the traditional stochastic reconstruction method,in the second step,number of structural descriptors(SDs)for binding site and chain were determined sequentially for the configuration of binding site and saturated acyclic hydrocarbon chain.These structural descriptors for binding site and chain were selected from group contribution methods.To count the number of partial overlapping sections between structural descriptors for chain,two supplementary structural descriptors were created.All possible saturated structures of hydrocarbon chains can be represented by structural descriptors at the scale of property estimation.This strategy separates the building of a predefined molecule library from the stochastic reconstruction process.The exact structures of pseudo molecules represented by structural descriptors in this work can be determined with sufficient chemical knowledge.Fifty naphtha samples are tested independently to demonstrate the performance of the proposed strategy and the results show that the estimated properties were close enough to the experimental values.This strategy will benefit the molecular management of petrochemical industries and therefore improve economic and environmental efficiencies.展开更多
New descriptors were constructed and structures of some oxygen-containing organic compounds were parameterized. The multiple linear regression(MLR) and partial least squares regression(PLS) methods were employed t...New descriptors were constructed and structures of some oxygen-containing organic compounds were parameterized. The multiple linear regression(MLR) and partial least squares regression(PLS) methods were employed to build two relationship models between the structures and octanol/water partition coefficients(LogP) of the compounds. The modeling correlation coefficients(R) were 0.976 and 0.922, and the "leave one out" cross validation correlation coefficients(R(CV)) were 0.973 and 0.909, respectively. The results showed that the structural descriptors could well characterize the molecular structures of the compounds; the stability and predictive power of the models were good.展开更多
A molecular structural characterization (MSC) method called molecular vertexes correlative index (MVCI) was used to describe the structures of 30 substituted aromatic compounds. Through multiple linear regression ...A molecular structural characterization (MSC) method called molecular vertexes correlative index (MVCI) was used to describe the structures of 30 substituted aromatic compounds. Through multiple linear regression (MLR) and stepwise multiple regression (SMR), a quantitative structure-toxicity relationship (QSTR) model with 4 variables was obtained. The correlation coefficient (R) of the model was 0.9467. Through partial least-squares regression (PLS), another QSTR model with 5 principal components was obtained. The correlation coefficient (R) of the model was 0.9518. Both models were evaluated by performing the cross-validation with the leave-one-out (LOO) procedure and the Cross-Validation (CV) correlation coefficients (Rcv) were 0.9208 and 0.9214, respectively. The results suggested good stability and predictability of the models, and the molecular vertexes correlative index could successfully describe the structures of the substituted aromatic compounds.展开更多
The three-dimensional holographic vector of atomic interaction field(3D-Ho VAIF) is used to characterize the molecular structures of 45 nitroaromatic compounds.Two quantitative structure-toxicity relationship(QSAR...The three-dimensional holographic vector of atomic interaction field(3D-Ho VAIF) is used to characterize the molecular structures of 45 nitroaromatic compounds.Two quantitative structure-toxicity relationship(QSAR) models are built up by stepwise regression(SMR),multiple linear regression(MLR) and partial least-squares regression(PLS).The correlation coefficients(R) of the models are 0.960 and 0.961,respectively.Then the models are evaluated by performing the cross-validation with the leave-one-out(LOO) procedure and the correlation coefficients(RCV) are 0.949 and 0.941,respectively.The results show that the descriptors can successfully describe the structures of organic compounds.The stability and predictability of the model are satisfactory.展开更多
A new molecular structural characterization(MSC)method called molecular vertexes correlative index(MVCI)was constructed in this paper.The index was used to describe the structures of 45 compounds and a quantitativ...A new molecular structural characterization(MSC)method called molecular vertexes correlative index(MVCI)was constructed in this paper.The index was used to describe the structures of 45 compounds and a quantitative structure-activity relationship(QSAR)model of toxicity(–lgEC50)was obtained through multiple linear regression(MLR)and stepwise multiple regression(SMR).The correlation coefficient(R)of the model was 0.912,and the standard deviation(SD)of the model was 0.525.The estimation stability and prediction ability of the model were strictly analyzed by both internal and external validations.The Leave-One-Out(LOO)Cross-Validation(CV)correlation coefficient(RCV)was 0.816 and the standard deviation(SDCV)was 0.739,respectively.For the external validation,the correlation coefficient(Rtest)was 0.905 and the standard deviation(SDtest)was 0.520,respectively.The results showed that the index was superior in molecular structural representation.The stability and predictability of the model were good.展开更多
Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecu...Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecular electronegativity-distance vector(I-MEDV) was developed. It was used to describe the structures of 37 compounds of styrax japonicus sieb flowers. Through multiple linear regression(MLR),a QSRR model was built up. The correlation coefficient(R1) of the model was 0.980. Then,4 vectors were selected to build another model through the method of stepwise multiple regression(SMR) ,and the correlation coefficient(R2) of the model was 0.975. Moreover,all the two models were evaluated by performing the crossvalidation with the leave-one-out(LOO) procedure and the correlation coefficients(Rcv) were 0.948 and 0.968,respectively. The results show that the I-MEDV could successfully describe the structures of organic compounds. The stability and predictability of the models were good.展开更多
This paper presents an efficient image feature representation method, namely angle structure descriptor(ASD), which is built based on the angle structures of images. According to the diversity in directions, angle str...This paper presents an efficient image feature representation method, namely angle structure descriptor(ASD), which is built based on the angle structures of images. According to the diversity in directions, angle structures are defined in local blocks. Combining color information in HSV color space, we use angle structures to detect images. The internal correlations between neighboring pixels in angle structures are explored to form a feature vector. With angle structures as bridges, ASD extracts image features by integrating multiple information as a whole, such as color, texture, shape and spatial layout information. In addition, the proposed algorithm is efficient for image retrieval without any clustering implementation or model training. Experimental results demonstrate that ASD outperforms the other related algorithms.展开更多
LASP(large-scale atomistic simulation with neural network potential)software developed by our group since 2018 is a powerful platform(www.lasphub.com)for performing atomic simulation of complex materials.The software ...LASP(large-scale atomistic simulation with neural network potential)software developed by our group since 2018 is a powerful platform(www.lasphub.com)for performing atomic simulation of complex materials.The software integrates the neural network(NN)potential technique with the global potential energy surface exploration method,and thus can be utilized widely for structure prediction and reaction mechanism exploration.Here we introduce our recent update on the LASP program version 3.0,focusing on the new functionalities including the advanced neuralnetwork training based on the multi-network framework,the newly-introduced S^(7) and S^(8) power type structure descriptor(PTSD).These new functionalities are designed to further improve the accuracy of potentials and accelerate the neural network training for multipleelement systems.Taking Cu-C-H-O neural network potential and a heterogeneous catalytic model as the example,we show that these new functionalities can accelerate the training of multi-element neural network potential by using the existing single-network potential as the input.The obtained double-network potential Cu CHO is robust in simulation and the introduction of S^(7) and S^(8) PTSDs can reduce the root-mean-square errors of energy by a factor of two.展开更多
A new molecular structural characterization(MSC) method was constructed in this paper.The structure descriptors were used to describe the structures of 149 compounds.Through multiple linear regression(MLR) and ste...A new molecular structural characterization(MSC) method was constructed in this paper.The structure descriptors were used to describe the structures of 149 compounds.Through multiple linear regression(MLR) and stepwise multiple regression(SMR),a quantitative structure-retention relationship(QSRR) model with 6 variables was obtained.The correlation coefficient(R) of the model was 0.944.Through partial least-squares regression(PLS),another QSRR model with 5 principal components was obtained.The correlation coefficient(R) of the model was 0.941.The estimation stability and prediction ability of the two models was strictly analyzed by both internal and external validations.For the internal validation,the Cross-Validation(CV) correlation coefficients(RCV) for Leave-One-Out(LOO) were 0.931 and 0.932,respectively.For the external validation,the correlation coefficients(Rtest) of the two models were 0.907 and 0.932.The results suggested good stability and predictability of the model.The prediction results are in very good agreement with the experimental values.This paper provided a new and effective method for predicting the chromatography retention time.展开更多
Owing to increasing global demand for carbon neutral and fossil-free energy systems,extensive research is being conducted on efficient and inexpensive electrocatalysts for catalyzing the kinetically sluggish oxygen re...Owing to increasing global demand for carbon neutral and fossil-free energy systems,extensive research is being conducted on efficient and inexpensive electrocatalysts for catalyzing the kinetically sluggish oxygen reduction reaction(ORR)at the cathode of fuel cells.Platinum(Pt)-based alloys are considered promising candidates for replacing expensive Pt catalysts.However,the current screening process of Pt-based alloys is time-consuming and labor-intensive,and the descriptor for predicting the activity of Pt-based catalysts is generally inaccurate.This study proposed a strategy by combining high-throughput first-principles calculations and machine learning to explore the descriptor used for screening Pt-based alloy catalysts with high Pt utilization and low Pt consump-tion.Among the 77 prescreened candidates,we identified 5 potential candidates for catalyzing ORR with low overpotential.Furthermore,during the second and third rounds of active learning,more Pt-based alloys ORR candidates are identi-fied based on the relationship between structural features of Pt-based alloys and their activity.In addition,we highlighted the role of structural features in Pt-based alloys and found that the difference between the electronegativity of Pt and heteroatom,the valence electrons number of the heteroatom,and the ratio of heteroatoms around Pt are the main factors that affect the activity of ORR.More importantly,the combination of those structural features can be used as structural descriptor for predicting the activity of Pt-based alloys.We believe the findings of this study will provide new insight for predicting ORR activ-ity and contribute to exploring Pt-based electrocatalysts with high Pt utiliza-tion and low Pt consumption experimentally.展开更多
By classifying non-hydrogen atoms of organic compounds,parametric dyeing,and establishing the relationship between non-hydrogen atoms,new structure descriptors were obtained.The structures of 48 common allergenic frag...By classifying non-hydrogen atoms of organic compounds,parametric dyeing,and establishing the relationship between non-hydrogen atoms,new structure descriptors were obtained.The structures of 48 common allergenic fragrance organic compounds were parametrically characterized.The multiple linear regression(MLR)and partial least-squares regression(PLS)methods were used to build two models of relationship between the compound structure and chromatographic retention time.The stability of the models was evaluated by the"leave-one-out"cross test,and the predictive ability of the models was tested using an external sample set.The correlation coefficients(R2)of the two models are 0.9791 and 0.9744,those(R(CV)~2)of the cross test are 0.8542 and 0.7464,and those(R(test)~2)of the external prediction are 0.9802 and 0.9367,indicating that the models built have good fitting ability,stability and external forecasting capabilities.The structural factors affecting the chromatographic retention time of the compounds were analyzed.The results show that the compound with more secondary carbon atoms may have larger chromatographic retention time(tR)value.This paper has certain reference value for the study on the relationship between the structures and properties of allergenic fragrance organic compounds.展开更多
Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Her...Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Here,we describe a general strategy to improve the accuracy of docking pose prediction by implementing the structural descriptor-based fltering and KGS-penalty function-based conformational clustering in an unbiased manner.We assessed our method against 150 high-quality protein–ligand complex structures.Surprisingly,such simple components are suffcient to improve the accuracy of docking pose prediction.The success rate of predicting near-native docking pose increased from 53%of the targets to 78%.We expect that our strategy may have general usage in improving currently available molecular docking programs.展开更多
Nitrogen(N)doping has been widely adopted to improve the light absorption of TiO_(2).However,the newly introduced N-2p states are largely localized thus barely overlap with O-2p states in the valence band of TiO_(2),r...Nitrogen(N)doping has been widely adopted to improve the light absorption of TiO_(2).However,the newly introduced N-2p states are largely localized thus barely overlap with O-2p states in the valence band of TiO_(2),resulting in a shoulder-like absorption edge.To realize an apparent overlap between N-2p and O-2p states,charge compensation between N^(3-)and O^(2-)via electron transfer from oxygen vacancies(VO)to N dopants is one possible strategy.To verify this,in numerous doping configurations of N/VO-codoped anatase TiO_(2),we identified two types of VOposition independent N-dopant spatial orderings by efficient screening enabled with a newly designed structural descriptor.Compared with others,these two types of the N-dopant spatial orderings are highly beneficial for charge compensation to produce an apparent overlap between N-2p and O-2p states,therefore achieving a large bandgap narrowing.Furthermore,the two types of the N-dopant spatial orderings can also be generalized to N/VO-codoped rutile TiO_(2)for bandgap narrowing.展开更多
Ceria-zirconia mixed oxides(CZMO)are widely used in many important catalysis fields.However,pure CZMO is known to have poor thermal stability.In this paper,a strategy was proposed to design Ce_(0.475)Zr_(0.475)M_(0.05...Ceria-zirconia mixed oxides(CZMO)are widely used in many important catalysis fields.However,pure CZMO is known to have poor thermal stability.In this paper,a strategy was proposed to design Ce_(0.475)Zr_(0.475)M_(0.05)O_(2)(M=La,Y,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Er,Lu,and,Yb)oxide surface with high thermal stability by using first-principles molecular dynamics(FPMD)simulation and experiment method.Through the structure stability analysis at different temperatures,the surface energyγas a function of R_(ion)/D_(ave)is identified as a quantitative structure descriptor for analyzing the doping effect of rare earth(RE)elements on the thermal stability of Ce_(0.475)Zr_(0.475)M_(0.05)O_(2).By doping the suitable RE,γcan be adjusted to the optimal range to enhance the thermal stability of Ce_(0.475)Zr_(0.475)M_(0.05)O_(2).With this strategy,it can be predicted that the sequence of thermal stability improvement is Y>La>Gd>Nd>Pr>Pm>Sm>Eu>Tb>Er>Yb>Lu,which was further verified by our experiment results.After thermal treatment at 1100℃for 10 h,the specific surface area(SSA)of aged Y-CZ and La-CZ samples can reach 21.34 and 19.51 m~2/g,which is 63.02%and 49.04%higher than the CZMO sample without doping because the surface doping of Y and La is in favor of inhibiting the surface atoms thermal displacement.In a word,the strategy proposed in this work can be expected to provide a viable way for designing the highly efficient CZMO materials in extensive applications and promoting the usages of the high-abundance rare-earth elements Y and La.展开更多
Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of struc- ture modeling and representation from two aspects...Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of struc- ture modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we pro- pose Local Structured Descriptor to capture the object's local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local strnctured model with a boosted fea- ture selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.展开更多
基金the support of International(Regional)Cooperation and Exchange Project(61720106008)National Natural Science Fund for Distinguished Young Scholars(61925305)National Natural Science Foundation of China(61873093)。
文摘Molecular reconstruction is a rapid and reliable way to provide molecular detail of petroleum fractions,which is required in the kinetic modeling of petroleum conversation processes at the molecular level.In the typical stochastic reconstruction method,the estimation of properties of pseudo molecules that are generated by Monte Carlo sampling depends on the building of predefined molecular libraries,which is expensive and inaccessible for certain petroleum fractions.In this paper,a novel stochastic reconstruction strategy is proposed,which is based on a stratified library of structural descriptors.Properties of pseudo molecules generated in the novel strategy can be directly estimated by group contribution method in the condition of lacking predefined molecular libraries.In this strategy,the molecular building diagram comprises two steps.First,the ring structure is configured by determining the number of rings.Different from the length of chain adopted in the traditional stochastic reconstruction method,in the second step,number of structural descriptors(SDs)for binding site and chain were determined sequentially for the configuration of binding site and saturated acyclic hydrocarbon chain.These structural descriptors for binding site and chain were selected from group contribution methods.To count the number of partial overlapping sections between structural descriptors for chain,two supplementary structural descriptors were created.All possible saturated structures of hydrocarbon chains can be represented by structural descriptors at the scale of property estimation.This strategy separates the building of a predefined molecule library from the stochastic reconstruction process.The exact structures of pseudo molecules represented by structural descriptors in this work can be determined with sufficient chemical knowledge.Fifty naphtha samples are tested independently to demonstrate the performance of the proposed strategy and the results show that the estimated properties were close enough to the experimental values.This strategy will benefit the molecular management of petrochemical industries and therefore improve economic and environmental efficiencies.
基金supported by the Youth Foundation of Education Bureau,Sichuan Province(13ZB0003)
文摘New descriptors were constructed and structures of some oxygen-containing organic compounds were parameterized. The multiple linear regression(MLR) and partial least squares regression(PLS) methods were employed to build two relationship models between the structures and octanol/water partition coefficients(LogP) of the compounds. The modeling correlation coefficients(R) were 0.976 and 0.922, and the "leave one out" cross validation correlation coefficients(R(CV)) were 0.973 and 0.909, respectively. The results showed that the structural descriptors could well characterize the molecular structures of the compounds; the stability and predictive power of the models were good.
基金supported by the Foundation of Education Bureau,Sichuan Province(09ZB036)
文摘A molecular structural characterization (MSC) method called molecular vertexes correlative index (MVCI) was used to describe the structures of 30 substituted aromatic compounds. Through multiple linear regression (MLR) and stepwise multiple regression (SMR), a quantitative structure-toxicity relationship (QSTR) model with 4 variables was obtained. The correlation coefficient (R) of the model was 0.9467. Through partial least-squares regression (PLS), another QSTR model with 5 principal components was obtained. The correlation coefficient (R) of the model was 0.9518. Both models were evaluated by performing the cross-validation with the leave-one-out (LOO) procedure and the Cross-Validation (CV) correlation coefficients (Rcv) were 0.9208 and 0.9214, respectively. The results suggested good stability and predictability of the models, and the molecular vertexes correlative index could successfully describe the structures of the substituted aromatic compounds.
基金supported by the Youth Foundation of Education Bureau,Sichuan Province(13ZB0003)
文摘The three-dimensional holographic vector of atomic interaction field(3D-Ho VAIF) is used to characterize the molecular structures of 45 nitroaromatic compounds.Two quantitative structure-toxicity relationship(QSAR) models are built up by stepwise regression(SMR),multiple linear regression(MLR) and partial least-squares regression(PLS).The correlation coefficients(R) of the models are 0.960 and 0.961,respectively.Then the models are evaluated by performing the cross-validation with the leave-one-out(LOO) procedure and the correlation coefficients(RCV) are 0.949 and 0.941,respectively.The results show that the descriptors can successfully describe the structures of organic compounds.The stability and predictability of the model are satisfactory.
基金supported by the Foundation of Education Bureau,Sichuan Province (09ZB036)Technology Bureau,Sichuan Province (2006j13-141)
文摘A new molecular structural characterization(MSC)method called molecular vertexes correlative index(MVCI)was constructed in this paper.The index was used to describe the structures of 45 compounds and a quantitative structure-activity relationship(QSAR)model of toxicity(–lgEC50)was obtained through multiple linear regression(MLR)and stepwise multiple regression(SMR).The correlation coefficient(R)of the model was 0.912,and the standard deviation(SD)of the model was 0.525.The estimation stability and prediction ability of the model were strictly analyzed by both internal and external validations.The Leave-One-Out(LOO)Cross-Validation(CV)correlation coefficient(RCV)was 0.816 and the standard deviation(SDCV)was 0.739,respectively.For the external validation,the correlation coefficient(Rtest)was 0.905 and the standard deviation(SDtest)was 0.520,respectively.The results showed that the index was superior in molecular structural representation.The stability and predictability of the model were good.
基金supported by the Youth Foundation of Education Bureau,Sichuan Province (09ZB036)Technology Bureau,Sichuan Province (2006j13-141)
文摘Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecular electronegativity-distance vector(I-MEDV) was developed. It was used to describe the structures of 37 compounds of styrax japonicus sieb flowers. Through multiple linear regression(MLR),a QSRR model was built up. The correlation coefficient(R1) of the model was 0.980. Then,4 vectors were selected to build another model through the method of stepwise multiple regression(SMR) ,and the correlation coefficient(R2) of the model was 0.975. Moreover,all the two models were evaluated by performing the crossvalidation with the leave-one-out(LOO) procedure and the correlation coefficients(Rcv) were 0.948 and 0.968,respectively. The results show that the I-MEDV could successfully describe the structures of organic compounds. The stability and predictability of the models were good.
基金supported by the National Natural Science Foundation of China (No.61170145, 61373081, 61402268, 61401260, 61572298)the Technology and Development Project of Shandong (No.2013GGX10125)+1 种基金the Natural Science Foundation of Shandong China (No.BS2014DX006, ZR2014FM012)the Taishan Scholar Project of Shandong, China
文摘This paper presents an efficient image feature representation method, namely angle structure descriptor(ASD), which is built based on the angle structures of images. According to the diversity in directions, angle structures are defined in local blocks. Combining color information in HSV color space, we use angle structures to detect images. The internal correlations between neighboring pixels in angle structures are explored to form a feature vector. With angle structures as bridges, ASD extracts image features by integrating multiple information as a whole, such as color, texture, shape and spatial layout information. In addition, the proposed algorithm is efficient for image retrieval without any clustering implementation or model training. Experimental results demonstrate that ASD outperforms the other related algorithms.
基金supported by the National Key Research and Development Program of China (No.2018YFA0208600)the National Natural Science Foundation of China (No.91945301, No.22033003, No.92061112, No.22122301, and No.91745201)
文摘LASP(large-scale atomistic simulation with neural network potential)software developed by our group since 2018 is a powerful platform(www.lasphub.com)for performing atomic simulation of complex materials.The software integrates the neural network(NN)potential technique with the global potential energy surface exploration method,and thus can be utilized widely for structure prediction and reaction mechanism exploration.Here we introduce our recent update on the LASP program version 3.0,focusing on the new functionalities including the advanced neuralnetwork training based on the multi-network framework,the newly-introduced S^(7) and S^(8) power type structure descriptor(PTSD).These new functionalities are designed to further improve the accuracy of potentials and accelerate the neural network training for multipleelement systems.Taking Cu-C-H-O neural network potential and a heterogeneous catalytic model as the example,we show that these new functionalities can accelerate the training of multi-element neural network potential by using the existing single-network potential as the input.The obtained double-network potential Cu CHO is robust in simulation and the introduction of S^(7) and S^(8) PTSDs can reduce the root-mean-square errors of energy by a factor of two.
基金supported by the Foundation of Education Bureau,Sichuan Province (09ZB036)Technology Bureau,Sichuan Province (2006j13-141)
文摘A new molecular structural characterization(MSC) method was constructed in this paper.The structure descriptors were used to describe the structures of 149 compounds.Through multiple linear regression(MLR) and stepwise multiple regression(SMR),a quantitative structure-retention relationship(QSRR) model with 6 variables was obtained.The correlation coefficient(R) of the model was 0.944.Through partial least-squares regression(PLS),another QSRR model with 5 principal components was obtained.The correlation coefficient(R) of the model was 0.941.The estimation stability and prediction ability of the two models was strictly analyzed by both internal and external validations.For the internal validation,the Cross-Validation(CV) correlation coefficients(RCV) for Leave-One-Out(LOO) were 0.931 and 0.932,respectively.For the external validation,the correlation coefficients(Rtest) of the two models were 0.907 and 0.932.The results suggested good stability and predictability of the model.The prediction results are in very good agreement with the experimental values.This paper provided a new and effective method for predicting the chromatography retention time.
基金National Natural Science Foundation of China,Grant/Award Numbers:51702352,21975280,22102208,52173234,52202214Young Elite Scientist Sponsorship Program by CAST,Grant/Award Number:YESS20210226+3 种基金Shenzhen Science and Technology Program,Grant/Award Numbers:RCJC20200714114435061,JCYJ20210324102008023,JSGG20210802153408024Shenzhen-Hong Kong-Macao Technology Research Program,Grant/Award Number:Type C,SGDX2020110309300301Natural Science Foundation of Guangdong Province,Grant/Award Numbers:2022A1515010554,2023A1515030178CCF-Tencent Open Fund and Innovation and Program for Excellent Young Researchers of SIAT,Grant/Award Number:E1G041。
文摘Owing to increasing global demand for carbon neutral and fossil-free energy systems,extensive research is being conducted on efficient and inexpensive electrocatalysts for catalyzing the kinetically sluggish oxygen reduction reaction(ORR)at the cathode of fuel cells.Platinum(Pt)-based alloys are considered promising candidates for replacing expensive Pt catalysts.However,the current screening process of Pt-based alloys is time-consuming and labor-intensive,and the descriptor for predicting the activity of Pt-based catalysts is generally inaccurate.This study proposed a strategy by combining high-throughput first-principles calculations and machine learning to explore the descriptor used for screening Pt-based alloy catalysts with high Pt utilization and low Pt consump-tion.Among the 77 prescreened candidates,we identified 5 potential candidates for catalyzing ORR with low overpotential.Furthermore,during the second and third rounds of active learning,more Pt-based alloys ORR candidates are identi-fied based on the relationship between structural features of Pt-based alloys and their activity.In addition,we highlighted the role of structural features in Pt-based alloys and found that the difference between the electronegativity of Pt and heteroatom,the valence electrons number of the heteroatom,and the ratio of heteroatoms around Pt are the main factors that affect the activity of ORR.More importantly,the combination of those structural features can be used as structural descriptor for predicting the activity of Pt-based alloys.We believe the findings of this study will provide new insight for predicting ORR activ-ity and contribute to exploring Pt-based electrocatalysts with high Pt utiliza-tion and low Pt consumption experimentally.
基金the Youth Foundation of Sichuan Provincial Department of Education(18ZB0323)。
文摘By classifying non-hydrogen atoms of organic compounds,parametric dyeing,and establishing the relationship between non-hydrogen atoms,new structure descriptors were obtained.The structures of 48 common allergenic fragrance organic compounds were parametrically characterized.The multiple linear regression(MLR)and partial least-squares regression(PLS)methods were used to build two models of relationship between the compound structure and chromatographic retention time.The stability of the models was evaluated by the"leave-one-out"cross test,and the predictive ability of the models was tested using an external sample set.The correlation coefficients(R2)of the two models are 0.9791 and 0.9744,those(R(CV)~2)of the cross test are 0.8542 and 0.7464,and those(R(test)~2)of the external prediction are 0.9802 and 0.9367,indicating that the models built have good fitting ability,stability and external forecasting capabilities.The structural factors affecting the chromatographic retention time of the compounds were analyzed.The results show that the compound with more secondary carbon atoms may have larger chromatographic retention time(tR)value.This paper has certain reference value for the study on the relationship between the structures and properties of allergenic fragrance organic compounds.
文摘Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Here,we describe a general strategy to improve the accuracy of docking pose prediction by implementing the structural descriptor-based fltering and KGS-penalty function-based conformational clustering in an unbiased manner.We assessed our method against 150 high-quality protein–ligand complex structures.Surprisingly,such simple components are suffcient to improve the accuracy of docking pose prediction.The success rate of predicting near-native docking pose increased from 53%of the targets to 78%.We expect that our strategy may have general usage in improving currently available molecular docking programs.
基金financially supported by the National Natural Science Foundation of China(Nos.51972312,51825204,21633009)。
文摘Nitrogen(N)doping has been widely adopted to improve the light absorption of TiO_(2).However,the newly introduced N-2p states are largely localized thus barely overlap with O-2p states in the valence band of TiO_(2),resulting in a shoulder-like absorption edge.To realize an apparent overlap between N-2p and O-2p states,charge compensation between N^(3-)and O^(2-)via electron transfer from oxygen vacancies(VO)to N dopants is one possible strategy.To verify this,in numerous doping configurations of N/VO-codoped anatase TiO_(2),we identified two types of VOposition independent N-dopant spatial orderings by efficient screening enabled with a newly designed structural descriptor.Compared with others,these two types of the N-dopant spatial orderings are highly beneficial for charge compensation to produce an apparent overlap between N-2p and O-2p states,therefore achieving a large bandgap narrowing.Furthermore,the two types of the N-dopant spatial orderings can also be generalized to N/VO-codoped rutile TiO_(2)for bandgap narrowing.
基金Project supported by China Postdoctoral Science Foundation(2020M680616)Major State Research Development Program of Hebei province(20374202D)。
文摘Ceria-zirconia mixed oxides(CZMO)are widely used in many important catalysis fields.However,pure CZMO is known to have poor thermal stability.In this paper,a strategy was proposed to design Ce_(0.475)Zr_(0.475)M_(0.05)O_(2)(M=La,Y,Pr,Nd,Pm,Sm,Eu,Gd,Tb,Er,Lu,and,Yb)oxide surface with high thermal stability by using first-principles molecular dynamics(FPMD)simulation and experiment method.Through the structure stability analysis at different temperatures,the surface energyγas a function of R_(ion)/D_(ave)is identified as a quantitative structure descriptor for analyzing the doping effect of rare earth(RE)elements on the thermal stability of Ce_(0.475)Zr_(0.475)M_(0.05)O_(2).By doping the suitable RE,γcan be adjusted to the optimal range to enhance the thermal stability of Ce_(0.475)Zr_(0.475)M_(0.05)O_(2).With this strategy,it can be predicted that the sequence of thermal stability improvement is Y>La>Gd>Nd>Pr>Pm>Sm>Eu>Tb>Er>Yb>Lu,which was further verified by our experiment results.After thermal treatment at 1100℃for 10 h,the specific surface area(SSA)of aged Y-CZ and La-CZ samples can reach 21.34 and 19.51 m~2/g,which is 63.02%and 49.04%higher than the CZMO sample without doping because the surface doping of Y and La is in favor of inhibiting the surface atoms thermal displacement.In a word,the strategy proposed in this work can be expected to provide a viable way for designing the highly efficient CZMO materials in extensive applications and promoting the usages of the high-abundance rare-earth elements Y and La.
文摘Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of struc- ture modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we pro- pose Local Structured Descriptor to capture the object's local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local strnctured model with a boosted fea- ture selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.