摘要
人类基因启动子预测和识别是DNA序列分析中的一项重要任务.提出了一个基于KL散度和BP神经网络的人类基因启动子识别算法.利用KL散度提取分辨力最强的六联体来区分启动子和非启动子区域,将这些六联体的出现频率作为识别启动子的组成成分特征,结合CpG岛特征,应用BP神经网络技术建立人类启动子识别系统.该系统有3个分类器,即启动子-外显子分类器,启动子-内含子分类器和启动子-3’UTR分类器,每个分类器都是一个BP神经网络,通过3个分类器的结果来综合预测启动子序列.对测试集的实验结果为:敏感性达到51.4%,特异性达到52.9%.
Promoter prediction and recognition in human genome is an important task in DNA sequence analysis. We present a novel human promoter recognition algorithm based on KL divergence and BP neural network. We extract the most effective 6-mers that distinguish promoter sequenec regions from other DNA sequences regions by KL divergence,and choose frequencies of the 6-mers as the compo- nent features. We combine the component features and CpG island features,and then apply BP neural network to construct a human promoter recognition system. The system consists of three classifiers: Promoter-Exon classifier, Promoter-Intron classifier and Promoter-3 '-UTR classifier. Each classifier is a BP neural network. If an unknown sequence is regarded as a promoter by two or three classifiers, it is predicted as a promoter. The evaluation results on testing set are 51.40% in sensitivity and 52.9 % in specificity.
出处
《辽宁师范大学学报(自然科学版)》
CAS
2010年第1期42-45,共4页
Journal of Liaoning Normal University:Natural Science Edition
基金
辽宁省博士科研启动基金项目(20061052)