期刊文献+

基于KL散度和BP神经网络的人类基因启动子识别 被引量:2

Promoter recognition in human genome based on KL divergence and BP neural network
下载PDF
导出
摘要 人类基因启动子预测和识别是DNA序列分析中的一项重要任务.提出了一个基于KL散度和BP神经网络的人类基因启动子识别算法.利用KL散度提取分辨力最强的六联体来区分启动子和非启动子区域,将这些六联体的出现频率作为识别启动子的组成成分特征,结合CpG岛特征,应用BP神经网络技术建立人类启动子识别系统.该系统有3个分类器,即启动子-外显子分类器,启动子-内含子分类器和启动子-3’UTR分类器,每个分类器都是一个BP神经网络,通过3个分类器的结果来综合预测启动子序列.对测试集的实验结果为:敏感性达到51.4%,特异性达到52.9%. Promoter prediction and recognition in human genome is an important task in DNA sequence analysis. We present a novel human promoter recognition algorithm based on KL divergence and BP neural network. We extract the most effective 6-mers that distinguish promoter sequenec regions from other DNA sequences regions by KL divergence,and choose frequencies of the 6-mers as the compo- nent features. We combine the component features and CpG island features,and then apply BP neural network to construct a human promoter recognition system. The system consists of three classifiers: Promoter-Exon classifier, Promoter-Intron classifier and Promoter-3 '-UTR classifier. Each classifier is a BP neural network. If an unknown sequence is regarded as a promoter by two or three classifiers, it is predicted as a promoter. The evaluation results on testing set are 51.40% in sensitivity and 52.9 % in specificity.
出处 《辽宁师范大学学报(自然科学版)》 CAS 2010年第1期42-45,共4页 Journal of Liaoning Normal University:Natural Science Edition
基金 辽宁省博士科研启动基金项目(20061052)
关键词 启动子识别 组成成分特征 CPG岛 KL散度 BP神经网络 promoter recognition component feature CpG islands KL divergence BP neural network
  • 相关文献

参考文献14

  • 1LANDER E S, LINTON L M, BIRREN B, et al. Initial sequencing and analysis of the human genome[J].Nature, 2001,409(8622):860-921.
  • 2CARTHARIUS K,FRECH K, GROTE K, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites[J]. Bioinformatics,2005,21 (13) :2933-2942.
  • 3BAJIC V B,CHONG A, SEAH S H,et al. An intelligent system for vertebrate promoter recognition[J]. IEEE Intelligent systems, 2002,17(4) :64-70.
  • 4DOWN T A, HUBBARD T J. Computational detection and location of transcription start sites in mammalian genomic DNA [J]. Genome Res,2002,12(3):458-461.
  • 5KNUDSEN S. Promoter 2.0:for the recognition of PoⅢ promoter sequences[J]. Bioinformatics, 1999,15(5) :356-361.
  • 6OHLER U, LIAO G C, NIEMANN H,et al. Computational analysis of core promoters in the Drosophila genome[J]. Genome Biology, 2002,3 (12) : 1-12.
  • 7BAJIC V B,SEAH S H,CHONG A,et al. Computer model for recognition of functional transcription start sites in polymerase Ⅱ promoters of vertebrates[J]. Journal of Molecular Graphics & Modeling,2003,21 (5) : 323-332.
  • 8WU S, XIE X, LIEW A W C,et al. Eukaryotic promoter prediction based on relative entropy and positional information[J]. Physical review E,2007,75(4):041908. 1-041908.7.
  • 9IOSHIKHES I P, ZHANG M Q. Large-scale human promoter mapping using CpG islands [J]. Nat Genet,2000,26(1):61-63.
  • 10DAVULURI R, GROSSE I, ZHANG M Q. Computational identification of promoters and first exons in the human genome [J]. Nat Genet, 2001,29(4) : 412-417.

同被引文献13

  • 1冯冲,陈肇雄,黄河燕,张亮,王江伟.基于条件随机域的复杂最长名词短语识别[J].小型微型计算机系统,2006,27(6):1134-1139. 被引量:16
  • 2童庆,郑浩然,宁岩,王煦法.基于统计组合与CpG含量分类的基因预测算法[J].北京生物医学工程,2007,26(2):178-181. 被引量:2
  • 3史庆伟,赵政,鲍虎.基于条件随机域的Web信息抽取[J].辽宁工程技术大学学报(自然科学版),2007,26(4):570-572. 被引量:2
  • 4BAJIC V B, TAN S L, SUZUKI Y, SUGANO S. Promoter prediction analysis on the whole human genome [ J ]. Nature Biotechnology,2004, 22 : 1467-1473.
  • 5BAJIC V B, CHONG A, SEAH S H, BRUSIC V. An intel- ligent system for vertebrate promoter recognition [ J]. IEEE Intelligent Systems, 2002, 17: 64-70.
  • 6WU S, XIE X, YAN H. Eukaryotic promoter prediction based on relative entropy and positional information [ J ]. Physical Review E, 2007, 75 (1) : 1-7.
  • 7ESKIN E , PEVZNER P A. Finding composite regulatory patterns in DNA sequences [ J ]. Bioinformatics, 2002, 18 ( 1 ) : 354-363.
  • 8COMON P. Independent component analysis, a new concept [J]. Signal Processing, 1994,36(3): 287-314.
  • 9LINDSAY I S. A tutorial on principal components analysis. Ithaca: Cornell University, 2002: 26-28.
  • 10DBTSS数据库[EB/OL].[2007-03-05].http://dbtss.bioinf.med.uni-goettingen.de.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部