摘要
采用Boosting机制的决策树集成分类器对嗜热和常温蛋白进行模式识别。通过自一致性检验、交叉验证和独立样本测试三种方法检测,其中作为Boosting算法中新的Logitboost算法表现更好,其识别的精度分别为100%、88.4%和89.5%,优于神经网络的识别效果。同时探讨了蛋白质分子大小对识别效果的影响。结果表明,将Boosting算法与其它单一分类器有效结合,有望提高研究者对生物分子相关特性的识别能力。
In this paper, the Boosting-based decision tree ensemble classifiers were applied to discriminate thermophilic and mesophilic proteins. Three methods, namely, self-consistency test, 5-fold cross-validation and independent testing with other dataset, were used to evaluate the performance and robust of the models. Logitboost, as a novel classifier in Boosting algorithm, performed better than Adaboost. The overall accuracy of the three methods was 100% ,88.4% and 89.5% , respectively. It was demonstrated that LogitBoost performed comparably or even better than that of neural network, a very powerful classifier widely used in biological literatures. The influence of protein size on discrimination was addressed. It is anticipated that the power in predicting many bio-macromolecular attributes will be further strengthened if the Boosting and some other existing algorithms can be effectively complemented with each other.
出处
《生物工程学报》
CAS
CSCD
北大核心
2006年第6期1026-1031,共6页
Chinese Journal of Biotechnology
基金
国务院侨办科研基金项目(No.05Q0018)
福建省科技计划项目重点项目基金(No.2003I020)资助。~~
关键词
BOOSTING
决策树
集成分类器
模式识别
嗜热蛋白
Boosting, decision tree, ensemble classifier, pattern recognition, thermophilic protein