期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
ON COUNTING THE FREQUENCY DISTRIBUTION OF STRING MOTIFS IN MOLECULAR SEQUENCES
1
作者 MATTIA C. F. PROSPERI LUCIANO PROSPERI +1 位作者 REBECCA R. GRAY MARCO SALEMI 《International Journal of Biomathematics》 2012年第6期121-139,共19页
This work investigates frequency distributions of strings within a text. The mathematical derivation accounts for variable alphabet size, character probabilities, and string/text lengths, under both the Bernoullian an... This work investigates frequency distributions of strings within a text. The mathematical derivation accounts for variable alphabet size, character probabilities, and string/text lengths, under both the Bernoullian and the Markovian model for string generation. The analysis is limited to the set of nonclumpable strings, that cannot overlap with them selves. Two formulae (exact and approximated) are derived, calculating the frequency distribution of a string of length m found inside a text of length n (with m 〈: n). The approximated formula has a constant complexity (in contrast to an exponential com plexity of the exact) and makes it applicable to very long texts. The proposed formulae were applied to analyze string frequencies in a portion of the human genome, and to recalculate frequencies of known repeated motif within genes, associated to genetic dis eases. A comparison with stateoftheart methods was provided. The formulae presentedhere can be of use in the statistical evaluation of specific motif frequencies within very long texts (e.g. genes or genomes) and help in characterizing motifs in pathologic conditions. 展开更多
关键词 COMBINATORICS GENETICS genomics.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部