Only from the primary structures of peptides, a new set of descriptors called the molecular electro-negativity edge-distance vector (VMED) was proposed and applied to describing and characterizing the molecular struct...Only from the primary structures of peptides, a new set of descriptors called the molecular electro-negativity edge-distance vector (VMED) was proposed and applied to describing and characterizing the molecular structures of oligopeptides and polypeptides, based on the electronegativity of each atom or electronic charge index (ECI) of atomic clusters and the bonding distance between atom-pairs. Here, the molecular structures of antigenic polypeptides were well expressed in order to propose the auto-mated technique for the computerized identification of helper T lymphocyte (Th) epitopes. Furthermore, a modified MED vector was proposed from the primary structures of polypeptides, based on the ECI and the relative bonding distance of the fundamental skeleton groups. The side-chains of each amino acid were here treated as a pseudo-atom. The developed VMED was easy to calculate and able to work. Some quantitative model was established for 28 immunogenic or antigenic polypeptides (AGPP) with 14 (1― 14) Ad and 14 other restricted activities assigned as "1"(+) and "0"(-), respectively. The latter comprised 6 Ab(15-20), 3 Ak(21-23), 2 Ek(24-26), 2 H-2k(27 and 28) restricted sequences. Good results were obtained with 90% correct classification (only 2 wrong ones for 20 training samples) and 100% correct prediction(none wrong for 8 testing samples); while con-trastively 100% correct classification (none wrong for 20 training samples) and 88% correct classification (1 wrong for 8 testing samples). Both stochastic samplings and cross valida-tions were performed to demonstrate good performance. The described method may also be suitable for estimation and prediction of classes I and II for major histocompatibility an-tigen (MHC) epitope of human. It will be useful in immune identification and recognition of pro-teins and genes and in the design and devel-opment of subunit vaccines. Several quantitative structure activity relationship (QSAR) models were developed for various oligopeptides and polypeptides including 58 dipeptides and 31 pentapeptides with angiotensin converting enzyme (ACE) inhibition by multiple linear regression (MLR) method. In order to explain the ability to characterize molecular structure of polypeptides, a molecular modeling investigation on QSAR was performed for functional prediction of polypeptide sequences with anti-genic activity and heptapeptide sequences with tachykinin activity through quantitative se-quence-activity models (QSAMs) by the molecular electronegativity edge-distance vector (VMED). The results showed that VMED exhibited both excellent structural selectivity and good activity prediction. Moreover, the results showed that VMED behaved quite well for both QSAR and QSAM of poly-and oli-gopeptides, which exhibited both good estimation ability and prediction power, equal to or better than those reported in the previous references. Finally, a preliminary conclusion was drwan: both classical and modified MED vectors were very useful structural descriptors. Some suggestions were proposed for further studies on QSAR/QSAM of proteins in various fields.展开更多
A new descriptor,namely scores vector of zero dimension,one dimension,two dimension and three dimension(SZOTT),was derived from principle components analysis of a matrix of 1 369 structural variables including 0D,1D,2...A new descriptor,namely scores vector of zero dimension,one dimension,two dimension and three dimension(SZOTT),was derived from principle components analysis of a matrix of 1 369 structural variables including 0D,1D,2D and 3D information for 20 coded amino acids.SZOTT scales were then employed to express structures of 20 thromboplastin inhibitors and 34 bactericidal peptides.The correlation coefficients of both whole calibration(%R%2=%R%2cu)and of cross validation(%Q%2=%R%2cv)for the multiple-variable models by classical partial least squares(PLS)and orthogonal signal correction-partial least squares(OSC-PLS)of 20 thromboplastin inhibitors were 0.989 and 0.748,0.994 and 0.936,respectively.%R%2 and %Q%2 for the models by PLS and OSC-PLS of 34 bactericidal peptides were 0.619 and 0.406,0.910 and 0.503,respectively.Satisfactory results obtained showed that structural information related to biological activity in both data sets could be described by SZOTT which included plentiful information related to biological activity,and which was conveniently operated and easy interpreted.,also predictive capability of models were relative robust.There is a high prospect for SZOTT wide applications on quantitative sequence-activity modeling(QSAM)of peptides.展开更多
A new descriptor, called vector of topological and structural information for coded and noncoded amino acids (VTSA), was derived by principal component analysis (PCA) from a matrix of 66 topological and structural var...A new descriptor, called vector of topological and structural information for coded and noncoded amino acids (VTSA), was derived by principal component analysis (PCA) from a matrix of 66 topological and structural variables of 134 amino acids. The VTSA vector was then applied into two sets of peptide quantitative structure-activity relationships or quantitative sequence-activity modelings (QSARs/QSAMs). Molded by genetic partial least squares (GPLS), support vector machine (SVM), and immune neural network (INN), good results were obtained. For the datasets of 58 angiotensin converting enzyme inhibitors (ACEI) and 89 elastase substrate catalyzed kinetics (ESCK), the R 2, cross-validation R 2, and root mean square error of estimation (RMSEE) were as follows: ACEI, R cu 2 ?0.82, Q cu 2 ?0.77, E rmse?0.44 (GPLS+SVM); ESCK, R cu 2 ?0.84, Q cu 2 ?0.82, E rmse?0.20 (GPLS+INN), respectively.展开更多
基金Supported by National High-Tech R&D Programme of China (863) (Grant No. 2006AA02Z312)National 111 Programme Introducing Talents of Discipline to Universities (Grant No. 0507111106)+6 种基金National Chunhui Project (Grant No. 990404+00307)State New Drug Project (Grant No. 1996ND1035A01)Fok YingTung Educational Foundation (Grant No. 980706)State Key Laboratory of Chemo/Biosensing and Chemometrics Foundation (KCBCF0501201)Chongqing University Innovation Fund (CUIF030506)Chongqing Municipality Applied Science Fund (Grant No. CASF01-3-6)Momentous Juche Innovation Fundfor Tackle Key Problem Items (MJIF 03-5-6+04-10-10)
文摘Only from the primary structures of peptides, a new set of descriptors called the molecular electro-negativity edge-distance vector (VMED) was proposed and applied to describing and characterizing the molecular structures of oligopeptides and polypeptides, based on the electronegativity of each atom or electronic charge index (ECI) of atomic clusters and the bonding distance between atom-pairs. Here, the molecular structures of antigenic polypeptides were well expressed in order to propose the auto-mated technique for the computerized identification of helper T lymphocyte (Th) epitopes. Furthermore, a modified MED vector was proposed from the primary structures of polypeptides, based on the ECI and the relative bonding distance of the fundamental skeleton groups. The side-chains of each amino acid were here treated as a pseudo-atom. The developed VMED was easy to calculate and able to work. Some quantitative model was established for 28 immunogenic or antigenic polypeptides (AGPP) with 14 (1― 14) Ad and 14 other restricted activities assigned as "1"(+) and "0"(-), respectively. The latter comprised 6 Ab(15-20), 3 Ak(21-23), 2 Ek(24-26), 2 H-2k(27 and 28) restricted sequences. Good results were obtained with 90% correct classification (only 2 wrong ones for 20 training samples) and 100% correct prediction(none wrong for 8 testing samples); while con-trastively 100% correct classification (none wrong for 20 training samples) and 88% correct classification (1 wrong for 8 testing samples). Both stochastic samplings and cross valida-tions were performed to demonstrate good performance. The described method may also be suitable for estimation and prediction of classes I and II for major histocompatibility an-tigen (MHC) epitope of human. It will be useful in immune identification and recognition of pro-teins and genes and in the design and devel-opment of subunit vaccines. Several quantitative structure activity relationship (QSAR) models were developed for various oligopeptides and polypeptides including 58 dipeptides and 31 pentapeptides with angiotensin converting enzyme (ACE) inhibition by multiple linear regression (MLR) method. In order to explain the ability to characterize molecular structure of polypeptides, a molecular modeling investigation on QSAR was performed for functional prediction of polypeptide sequences with anti-genic activity and heptapeptide sequences with tachykinin activity through quantitative se-quence-activity models (QSAMs) by the molecular electronegativity edge-distance vector (VMED). The results showed that VMED exhibited both excellent structural selectivity and good activity prediction. Moreover, the results showed that VMED behaved quite well for both QSAR and QSAM of poly-and oli-gopeptides, which exhibited both good estimation ability and prediction power, equal to or better than those reported in the previous references. Finally, a preliminary conclusion was drwan: both classical and modified MED vectors were very useful structural descriptors. Some suggestions were proposed for further studies on QSAR/QSAM of proteins in various fields.
文摘A new descriptor,namely scores vector of zero dimension,one dimension,two dimension and three dimension(SZOTT),was derived from principle components analysis of a matrix of 1 369 structural variables including 0D,1D,2D and 3D information for 20 coded amino acids.SZOTT scales were then employed to express structures of 20 thromboplastin inhibitors and 34 bactericidal peptides.The correlation coefficients of both whole calibration(%R%2=%R%2cu)and of cross validation(%Q%2=%R%2cv)for the multiple-variable models by classical partial least squares(PLS)and orthogonal signal correction-partial least squares(OSC-PLS)of 20 thromboplastin inhibitors were 0.989 and 0.748,0.994 and 0.936,respectively.%R%2 and %Q%2 for the models by PLS and OSC-PLS of 34 bactericidal peptides were 0.619 and 0.406,0.910 and 0.503,respectively.Satisfactory results obtained showed that structural information related to biological activity in both data sets could be described by SZOTT which included plentiful information related to biological activity,and which was conveniently operated and easy interpreted.,also predictive capability of models were relative robust.There is a high prospect for SZOTT wide applications on quantitative sequence-activity modeling(QSAM)of peptides.
基金the Foundations of National High Technology (863) Programme (Grant No. 2006AA02Z312)State New Drug Project (Grant No. 1996ND1035A01)+4 种基金Fok- Yingtung Educational Foundation (Grant No. 980706)State Key Laboratory of Chemo/Biosensing and Chemometrics Foundation (Grant No. KLCB005-0012)Chongqing University Innovation Fund (Grant No. CUIF030506)Chongqing Mu-nicipality Applied Science Fund (Grant No. CASF01-3-6)Momentous Juche Innovation Fund for Tackle Key Problem Items (Grant No. MJIF 06-9-9)
文摘A new descriptor, called vector of topological and structural information for coded and noncoded amino acids (VTSA), was derived by principal component analysis (PCA) from a matrix of 66 topological and structural variables of 134 amino acids. The VTSA vector was then applied into two sets of peptide quantitative structure-activity relationships or quantitative sequence-activity modelings (QSARs/QSAMs). Molded by genetic partial least squares (GPLS), support vector machine (SVM), and immune neural network (INN), good results were obtained. For the datasets of 58 angiotensin converting enzyme inhibitors (ACEI) and 89 elastase substrate catalyzed kinetics (ESCK), the R 2, cross-validation R 2, and root mean square error of estimation (RMSEE) were as follows: ACEI, R cu 2 ?0.82, Q cu 2 ?0.77, E rmse?0.44 (GPLS+SVM); ESCK, R cu 2 ?0.84, Q cu 2 ?0.82, E rmse?0.20 (GPLS+INN), respectively.