摘要
提出了一种利用神经网络为蛋白质家族建立模型的方法,这一方法的理论出发点是利用神经网络从一组同家族蛋白质序列中识别出共同的特征模式,建好的模型可用于预测蛋白质家族。使用这一方法,所能识别的模式在长度、位点等方面都不受限制,而且建模及预测过程中输入神经网络的蛋白质序列不需要作预对齐,对Pfam蛋白质库中的二十个家族运用此方法,预测的平均正确率达到了95.5%。
We present a method for modeling protein families by means of artificial Neural net (ANN). The method is based on identifying significant patterns in a set of protein sequences of the same family. The ANN can serve as a predictive tool for protein sequence classification. The patterns the ANN models can identify are not limited to exact positions and fixed spacing. And the input sequences used to built the model and to be predict do not need to be aligned. To measure the quality of the approach, we created an ANN model for each of the 20 families selected from the Pfam database. Averaged over the 20 families, the ANN detected 95.5% of the true positives.
出处
《生物数学学报》
CSCD
2003年第3期351-356,共6页
Journal of Biomathematics