摘要
从序列出发快速确定氧化还原酶的辅酶依赖类型对于了解其结构和功能、催化机制及构建辅酶再生体系具有重要指导作用。对Chou提出的伪氨基酸组成方法进行了修正并用于提取氧化还原酶序列特征值,采用k-近邻算法预测其辅酶依赖类型。当λ=48,w=0.1时,10倍交叉验证结果表明:其ROC曲线下面积为0.9536,预测精度达92.0%,比最优条件下伪氨基酸组成预测精度提高了3.5%;与其他7种常见特征值提取方法相比,修正的伪氨基酸组成表现最好。结果表明从序列出发预测氧化还原酶辅酶依赖类型是可行的,且修正的伪氨基酸组成可望成为一种新的有效提取蛋白质序列特征值方法。
Types of cofactor independency for newly found oxidoreductases sequences are usually determined by experimental analysis. These experimental methods are both time-consuming and costly. With the explosion of oxidoreductases sequences entering into the databanks, it is highly desirable to explore the feasibility of selectively classifying newly found oxidoreductases into their respective cofactor independency classes by means of an automated method. In this study, we proposed a modified Chou's pseudo-amino acid composition method to extract features from sequences and the κ-nearest neighbor was used as the classifier, and the results were very encouraging. When λ=48, w=0.1, the areas under the ROC curve of κ-nearest neighbor in 10-fold cross-validation was 0.9536; and the success rate was 92.0%, which was 3.5% higher than that of pseudo-amino acid composition. It was also better than all the other 7 feature extraction methods. Our results showed that predicting the cofactors of oxidoreductases was feasible and the modified pseudo-amino acid composition method may be a useful method for extracting features from protein sequences.
出处
《生物工程学报》
CAS
CSCD
北大核心
2008年第8期1439-1445,共7页
Chinese Journal of Biotechnology
基金
福建省自然科学基金项目(No.2007J0360)
高等学校博士学科点专项科研基金项目(No.20070385001)资助~~
关键词
氧化还原酶
辅酶依赖类型
修正的伪氨基酸组成
κ-近邻
ROC曲线下面积
oxidoreductases, cofactor independency type, modified pseudo-amino acid composition, κ-nearest neighbor, areas under the ROC curve