摘要
基于核酸序列在剪切位点上保守性、组分的不同和编码序列阅读框架的3周期性,模式生物全基因组序列分为外显子、内含子和基因间序列三类.三个标准离散源分别由64个三联体在整条序列上的概率和4个碱基序列首尾(剪切位点附近)共30个位点上的概率共同构成.某条序列的类型就由该序列的离散量同相应区间上三个标准离散量的离散增量确定.结果表明:具有184个信号参数的离散量预测比只有64个三联体参数的结果要高出5%,总体预测成功率:线虫为87.37%,拟南芥为91.08%,果蝇为92.28%,原核生物大肠杆菌的二种序列预测率为92.88%,酵母菌为94.88%.
Based on the conservation of nucleotides around splice sites,and the compositional feature and the existence of reading frames with 3-periodicity in a coding sequence,the complete sequences of the 5 model species genomes are grouped under three kinds:intron,exon and intergenic DNA. The three standard sources of diversity are respectively determined by the probabilities (bp/kb) of the 64 trimers and of the 4 bases at 30 positions around the splice sites. The classification of one sequence can be determined by the increment of diversity. The prediction results with 184 information signals of all sets are better than that only with 64 signals. The prediction accuracy with 184 signals are respectively about 87.37%, 91.08%, 92.28%,92.88% and 94.88% for C.elegans(C),A.thaliana(A), D.melanogasters (D), E.coli (E) and S.cerevisiae (S) genome.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2005年第2期166-172,共7页
Journal of Inner Mongolia University:Natural Science Edition
基金
国家自然科学基金项目 (3 0 1 6 0 0 2 5)~~