Analysis and prediction of exon, intron, intergenic region and splice sites for A. thaliana and C. elegans genomes

Analysis and prediction of exon, intron, intergenic region and splice sites for A. thaliana and C. elegans genomes

下载PDF

导出

摘要 Although a great deal of research has been undertaken in the area of the annotation of gene structure, predictive techniques are still not fully developed. In this paper, based on the characteristics of base composition of sequences and conservative of nucleotides at exon/intron splicing site, a least increment of diversity al-gorithm (LIDA) is developed for studying and predicting three kinds of coding exons, introns and intergenic regions. At first, by selecting the 64 trinucleotides composition and 120 position parameters of the four bases as informational parameters, coding exon, intron and intergenic sequence are predicted. The results show that overall predicted accuracies are 91.1% and 88.4%, respectively for A. thaliana and C. ele-gans genome. Subsequently, based on the po-sition frequencies of four kinds of bases in regions near intron/coding exon boundary, initia-tion and termination site of translation, 12 position parameters are selected as diversity source. And three kinds of the coding exons are predicted by use of the LIDA. The predicted successful rates are higher than 80%. These results can be used in sequence annotation. Although a great deal of research has been undertaken in the area of the annotation of gene structure, predictive techniques are still not fully developed. In this paper, based on the characteristics of base composition of sequences and conservative of nucleotides at exon/intron splicing site, a least increment of diversity al-gorithm (LIDA) is developed for studying and predicting three kinds of coding exons, introns and intergenic regions. At first, by selecting the 64 trinucleotides composition and 120 position parameters of the four bases as informational parameters, coding exon, intron and intergenic sequence are predicted. The results show that overall predicted accuracies are 91.1% and 88.4%, respectively for A. thaliana and C. ele-gans genome. Subsequently, based on the po-sition frequencies of four kinds of bases in regions near intron/coding exon boundary, initia-tion and termination site of translation, 12 position parameters are selected as diversity source. And three kinds of the coding exons are predicted by use of the LIDA. The predicted successful rates are higher than 80%. These results can be used in sequence annotation.

作者 Hao Lin Qian-Zhong Li Cui-Xia Chen

机构地区不详

出处《Journal of Biomedical Science and Engineering》 2009年第6期367-373,共7页 生物医学工程（英文）

关键词 EXON INTRON INTERGENIC Region SPLICE Site Increment of Diversity Exon Intron Intergenic Region Splice Site Increment of Diversity

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

参考文献1

1吕军,罗辽复.人类polⅡ启动子的识别[J].生物化学与生物物理进展,2005,32(12):1185-1191. 被引量：26

二级参考文献30

1Xie X H, Lu J, Kulbokas E J, et al. Systematic discovery of regulatory motifs in humanpromoters and 3′UTRs by comparison of several mammals. Nature, 2005, 434 (7031): 338～345
2Laxton R R. The measure of diversity. J Theor Biol, 1978, 71(1):51～67
3McLachlan G J. Discriminant Analysis and Statistical Pattern Recognition. New York:Wiley, 1992. 1～526
4Zhang M Q. Identification of protein coding regions in the human genome by quadraticdiscriminant analysis. Proc Natl Acad Sci USA, 1997, 94 (2): 565～568
5Zhang L R, Luo L F. Splice site prediction with quadratic discriminant analysis usingdiversity measure. Nucleic Acids Research, 2003, 31(21): 6214～6220
6Schmid C D, Praz V, Delorenzi M, et al. The eukaryotic promoter database EPD: theimpact of in silico primer extension. Nucleic Acids Research, 2004, 32:D82～85
7Matthias S, Andreas K, Kornelie F, et al. First pass annotation of promoters on humanchromosome 22. Genome Res, 2001, 11 (3):333～340
8Luo L F, Li H, Zhang L R. ORF organization and gene recognition in the yeast genome.Comp Funct Genomics, 2003, 4 (3): 318～328
9Suzuki Y, Yamashita R, Sugano S, et al. DBTSS, DataBase of transcriptional startsites: progress report 2004. Nucleic Acids Research, 2004, 32:D78～D81
10Suzuki Y, Taira H, Tsunoda T, et al. Diverse transcriptional initiation revealed byfine, large-scale mapping of mRNA start sites.EMBO Rep, 2001, 2 (5): 388～393

共引文献25

1罗辽复.信息生物学——关于编码信息量的两个假设(英文)[J].内蒙古大学学报（自然科学版）,2006,37(3):285-294. 被引量：2
2吕军,罗辽复,张颖,赵巨东.用非联配方法预测人类转录调节模体[J].生物化学与生物物理进展,2006,33(11):1044-1050.
3林昊,李前忠.基于二次判别的果蝇启动子识别[J].生物物理学报,2006,22(5):345-350. 被引量：7
4罗辽复.从生物信息学到信息生物学[J].合肥学院学报（自然科学版）,2007,17(2):1-9. 被引量：1
5晋宏营,罗辽复,张利绒.核酸-蛋白质结合能在剪切位点识别中的应用[J].生物物理学报,2007,23(3):185-191. 被引量：3
6孔帆帆,李宏,李号双,尹翔.基于改进马尔可夫模型的启动子预测算法[J].计算机工程与科学,2007,29(12):82-84.
7张颖,贾芸,吕军.大肠杆菌σ^(70)启动子的识别[J].生物物理学报,2007,23(6):475-481. 被引量：5
8张颖,罗辽复,吕军.使用多样性增量预测磷酸化位点[J].内蒙古大学学报（自然科学版）,2008,39(1):34-39. 被引量：7
9杨科利,许强.基于离散增量结合支持向量机方法的果蝇启动子预测[J].生物技术,2008,18(2):39-42. 被引量：1
10贾芸,赵巨东,吕军.基于N端信号的蛋白质亚细胞定位识别[J].内蒙古工业大学学报（自然科学版）,2008,27(2):81-87. 被引量：2

1William A. Thompson,Joel K. Weltman.Intergenic subset organization within a set of geographically-defined viral sequences from the 2009 H1N1 influenza A pandemic[J].American Journal of Molecular Biology,2012,2(1):32-41.
2黄正洋,黎寿丰,王钱保,李春苗,黄华云,穆春宇,赵振华.miR-26基因家族进化分析及其在鸡不同组织中表达研究[J].中国家禽,2019,0(17):15-19. 被引量：1
3Sarbottam Piya,Madhav P. Nepal.Characterization of Nuclear and Chloroplast Microsatellite Markers for <i>Falcaria vulgaris</i>(Apiaceae)[J].American Journal of Plant Sciences,2013,4(3):590-595.
4Donghui Song,Jing Li,Xiaoxu Hu,Bo Xi.Construction of a Shuttle Vector for Heterologous Gene Expression in Escherichia coli and Microalgae Anabaena[J].Engineering（科研）,2013,5(10):540-544. 被引量：2
5周原世,徐书婉,关沧海,胡增涛,姜兴明.恶性肿瘤中GAPLINC的调控作用及其与患者预后的关系[J].中华病理学杂志,2019,48(11):902-905. 被引量：3
6Xiaobo Shi,Xiuzhen Hu.Using the Support Vector Machine Algorithm to Predict β-Turn Types in Proteins[J].Engineering（科研）,2013,5(10):386-390.
7Ahmadreza Ejraei Bakyani,Azadeh Namdarpoor,Amir Nematollahi Sarvestani,Abbas Daili,Babak Raji,Feridun Esmaeilzadeh.A Simulation Approach for Screening of EOR Scenarios in Naturally Fractured Reservoirs[J].International Journal of Geosciences,2018,9(1):19-43.
8张芳杰,刘太君,尚爱民,叶焱.基于react-native的移动端周界入侵定位系统[J].无线通信技术,2019,28(3):1-5. 被引量：3
9Wei Song,Yang Hao.Generalized Alternating-Direction Implicit Finite-Difference Time-Domain Method in Curvilinear Coordinate System[J].Journal of Electromagnetic Analysis and Applications,2010,2(5):324-332.
10Madison Caballero,Jill Wegrzyn.gFACs:Gene Filtering,Analysis,and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks[J].Genomics, Proteomics & Bioinformatics,2019,17(3):305-310. 被引量：3

Journal of Biomedical Science and Engineering

2009年第6期

浏览历史

内容加载中请稍等...

Analysis and prediction of exon, intron, intergenic region and splice sites for A. thaliana and C. elegans genomes

参考文献1

二级参考文献30

共引文献25

相关作者

相关机构

相关主题

浏览历史