A new set of descriptors,namely score vectors of the zero dimension,one dimension,two dimensions and three dimensions(SZOTT),was derived from principle component analysis of a matrix of 1369 structural variables inclu...A new set of descriptors,namely score vectors of the zero dimension,one dimension,two dimensions and three dimensions(SZOTT),was derived from principle component analysis of a matrix of 1369 structural variables including 0D,1D,2D and 3D information for the 20 coded amino acids. SZOTT scales were then used in cleavage site prediction of human immunodeficiency virus type 1 protease. Linear discriminant analysis(LDA) and support vector machines(SVM) were applied to developing models to predict the cleavage sites. The results obtained by linear discriminant analysis(LDA) and support vector machines(SVM) are as follows. The Matthews correlation coefficients(MCC) by the resubstitution test,leave-one-out cross validation(LOOCV) and external validation are 0.879 and 0.911,0.849 and 0.901,0.822 and 0.846,respectively. The receiver operating characteristic(ROC) analysis showed that the SVM model possesses better simulative and predictive ability in comparison with the LDA model. Satisfactory results show that SZOTT descriptors can be further used to predict cleavage sites of human immunodeficiency virus type 1 protease.展开更多
基金Supported by the Research on National High-tech R&D Program (the 863 program) (Grant No. 2006AA02Z312)Innovative Group Program for Graduates of Chong- qing University, Science and Innovation Fund (Grant No. 200711C1A0010260)
文摘A new set of descriptors,namely score vectors of the zero dimension,one dimension,two dimensions and three dimensions(SZOTT),was derived from principle component analysis of a matrix of 1369 structural variables including 0D,1D,2D and 3D information for the 20 coded amino acids. SZOTT scales were then used in cleavage site prediction of human immunodeficiency virus type 1 protease. Linear discriminant analysis(LDA) and support vector machines(SVM) were applied to developing models to predict the cleavage sites. The results obtained by linear discriminant analysis(LDA) and support vector machines(SVM) are as follows. The Matthews correlation coefficients(MCC) by the resubstitution test,leave-one-out cross validation(LOOCV) and external validation are 0.879 and 0.911,0.849 and 0.901,0.822 and 0.846,respectively. The receiver operating characteristic(ROC) analysis showed that the SVM model possesses better simulative and predictive ability in comparison with the LDA model. Satisfactory results show that SZOTT descriptors can be further used to predict cleavage sites of human immunodeficiency virus type 1 protease.