摘要
定点突变后蛋白质稳定性的增加还是降低,是分子生物学和蛋白质工程的核心问题之一,也是目前生物信息学研究的重要领域。基于蛋白质序列信息对蛋白质定点突变后的稳定性进行预测的方法,因其简易、适用面广而得到广泛的研究应用。通过对编码策略(coding schemes)的探索,发现不同编码策略对预测准确率有较大影响,并发现基于进化信息的BLOSUM打分矩阵可以用于蛋白质定点突变稳定性预测,具有较高的预测准确率。应用基于BLOSUM62打分矩阵的神经网络(ANN)和支持向量机(SVM)算法,可以改进蛋白质定点突变后稳定性的预测,而且ANN+BLOSUM62在1623条序列的数据集上的实测结果优于目前国际通用的几款预测软件。
Different approaches were developed to predict the stability change of protein point mutation. Some of them used the sequence information while others used the structure information of the protein. The authors touched the problem with the protein sequence information. Encoding schemes of the machine learning approaches towards the problem differ from one to another. Sparse encoding and amino acid property encoding prevailed in the field. Physicochemical properties such as hydropathy, flexibility, electronic charge concentration and the isotropic surface area (ISA) of amino acids were picked up as input attributes. Evolution information which is embedded in BLOSUM matrices were tested in the paper. An improvement over previous encoding schemes was found in our experiments. Machine learning techniques such as SVM (Support vector machines) and artificial neural networks were used for evaluating the new encoding schemes. Results show that the BLOSUM62 encoding scheme outperform the sparse encoding scheme and amino acid property encoding scheme.
出处
《生物物理学报》
CAS
CSCD
北大核心
2009年第5期343-348,共6页
Acta Biophysica Sinica
基金
教育部留学回国人员科研启动基金资助项目G0610~~
关键词
定点突变
编码策略
稳定性预测
神经网络
SVM
BLOSUM62
Coding schemes
Protein point mutation
Neural network
SVM
Stability prediction