Homology detection plays a key role in bioinformatics, whereas substitution matrix is one of the most important components in homology detec- tion. Thus, besides the improvement of alignment algorithms, another effect...Homology detection plays a key role in bioinformatics, whereas substitution matrix is one of the most important components in homology detec- tion. Thus, besides the improvement of alignment algorithms, another effective way to enhance the accuracy of homology detection is to use proper substitution matrices or even construct new matrices. A study on the features of various matrices and on the comparison of the performances between differ- ent matrices in homology detection enable us to choose the most proper or optimal matrix for some specific applications. In this paper, by taking BLOSUM matrices as an example, some detailed features of matrices in homology detection are stud- ied by calculating the distributions of numbers of recognized proteins over different sequence identities and sequence lengths. Our results clearly showed that different matrices have different preferences and abilities to the recognition of remote homologous proteins. Furthermore, detailed features of the vari- ous matrices can be used to improve the accuracy of homology detection.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.90403120,10474041,and 10021001)the Nonlinear Project(973)of the NSM.
文摘Homology detection plays a key role in bioinformatics, whereas substitution matrix is one of the most important components in homology detec- tion. Thus, besides the improvement of alignment algorithms, another effective way to enhance the accuracy of homology detection is to use proper substitution matrices or even construct new matrices. A study on the features of various matrices and on the comparison of the performances between differ- ent matrices in homology detection enable us to choose the most proper or optimal matrix for some specific applications. In this paper, by taking BLOSUM matrices as an example, some detailed features of matrices in homology detection are stud- ied by calculating the distributions of numbers of recognized proteins over different sequence identities and sequence lengths. Our results clearly showed that different matrices have different preferences and abilities to the recognition of remote homologous proteins. Furthermore, detailed features of the vari- ous matrices can be used to improve the accuracy of homology detection.