Knowledge about characteristics shared across known members of a protein family enables their identification within the complete set of proteins in an organism. Shared features are usually expressed through motifs, wh...Knowledge about characteristics shared across known members of a protein family enables their identification within the complete set of proteins in an organism. Shared features are usually expressed through motifs, which can incorporate specific patterns and even amino acid (AA) biases. Based on a set of classification patterns and biases it can be determined which additional proteins may belong to a specific family and share its functionality. A bioinformatics tool (Prot-Class) was implemented to examine protein sequences and characterize them based upon user-defined AA composition percentages and user defined AA patterns. In addition the tool allows for the identification of repeated AA patterns, biased AA compositions within windows of user-defined length, and the characteristics of putative signal peptides and glycosylphosphatidylinositol (GPI) lipid anchors. ProtClass is general purpose and can be applied to analyze protein sequences from any organism. The Prot-Class source code is available through the GNU General Public License v3 and can be accessed via the Google Code Repository: http://code.google.com/p/prot-class/.展开更多
文摘Knowledge about characteristics shared across known members of a protein family enables their identification within the complete set of proteins in an organism. Shared features are usually expressed through motifs, which can incorporate specific patterns and even amino acid (AA) biases. Based on a set of classification patterns and biases it can be determined which additional proteins may belong to a specific family and share its functionality. A bioinformatics tool (Prot-Class) was implemented to examine protein sequences and characterize them based upon user-defined AA composition percentages and user defined AA patterns. In addition the tool allows for the identification of repeated AA patterns, biased AA compositions within windows of user-defined length, and the characteristics of putative signal peptides and glycosylphosphatidylinositol (GPI) lipid anchors. ProtClass is general purpose and can be applied to analyze protein sequences from any organism. The Prot-Class source code is available through the GNU General Public License v3 and can be accessed via the Google Code Repository: http://code.google.com/p/prot-class/.