摘要
Suboptimal alignments always reveal additional interesting biological features and have been successfully used to informally estimate the significance of an optimal alignment. Besides, traditional dynamic programming algorithms for sequence comparison require quadratic space, and hence are infeasible for long protein or DNA sequences. In this paper, a space-efficient sampling algorithm for computing suboptimal alignments is described. The algorithm uses a general gap model, where the cost associated with gaps is given by an affine score, and randomly selects an alignment according to the distribution of weights of all potential alignments. If x and y are two sequences with lengths n and m, respectively, then the space requirement of this algorithm is linear to the sum of n and m. Finally, an example illustrates the utility of the algorithm.
Suboptimal alignments always reveal additional interesting biological features and have been successfully used to informally estimate the significance of an optimal alignment. Besides, traditional dynamic programming algorithms for sequence comparison require quadratic space, and hence are infeasible for long protein or DNA sequences. In this paper, a space-efficient sampling algorithm for computing suboptimal alignments is described. The algorithm uses a general gap model, where the cost associated with gaps is given by an affine score, and randomly selects an alignment according to the distribution of weights of all potential alignments. If x and y are two sequences with lengths n and m, respectively, then the space requirement of this algorithm is linear to the sum of n and m. Finally, an example illustrates the utility of the algorithm.
基金
supported by the National Natural Science Foundation of China (Grant No.10771133)