摘要
At present, the feature extraction of protein sequences is the most basic issue to predict protein structural classes and is also the key problem to decide the quality of prediction. In order to predict protein structural classes accurately, we construct a 14-dimensional feature vector based on protein secondary and super-secondary structure information to reflect the content and spatial ordering of the given protein sequences. Among the vector, seven features about -helix bundle, hairpin motifs, Rossman folds, -plaits and other super-secondary structure information are first proposed in our paper. Experiments show that our method improves overall accuracy of lower similarity datasets 1189 and 640 by 0.9% - 3.8% and 0.5% - 4.2% respectively compared with other methods and has a competitive advantage for predicting proteins in and classes.
At present, the feature extraction of protein sequences is the most basic issue to predict protein structural classes and is also the key problem to decide the quality of prediction. In order to predict protein structural classes accurately, we construct a 14-dimensional feature vector based on protein secondary and super-secondary structure information to reflect the content and spatial ordering of the given protein sequences. Among the vector, seven features about -helix bundle, hairpin motifs, Rossman folds, -plaits and other super-secondary structure information are first proposed in our paper. Experiments show that our method improves overall accuracy of lower similarity datasets 1189 and 640 by 0.9% - 3.8% and 0.5% - 4.2% respectively compared with other methods and has a competitive advantage for predicting proteins in and classes.
作者
Longlong Liu
Jing Cui
Jie Zhou
Longlong Liu;Jing Cui;Jie Zhou(Department of Mathematical Sciences, Ocean University of China, Qingdao, China)