摘要
为了进一步研究大肠杆菌启动子的识别算法,结合大肠杆菌基因分子生物学的有关理论,利用支持向量机(support vector machine,SVM)方法对启动子进行了识别.根据启动子的序列保守性,从每个启动子样本中选取了长65bases的序列作为正样本,从大肠杆菌编码区选取相应长度的序列作为负样本,建立了基于支持向量机的分类器;并讨论了应用SVM方法时,核函数参数的选择问题.实验结果表明,基于支持向量机的识别方法能更好地提取启动子保守序列的统计特征,正样本和负样本的相关系数可以达到81.62%.
In order to research the recognition method for E. coli promoter, an approach based on support vector machine combining with molecubiology theory of E. coli gene is applied to the recognition of E. coli promoter sequence. According to the sequence conservation, some sequences with 65 bases are selected as positive samples and some corresponding non-promoters from E. coli coding areas are selected as negative samples, and a classifier based on support vector machine is constructed. Finally, the selection of kernel function has been discussed. Results show that the SVM-based approach can extract the statistical characteristic of promoters more effectively and the correlation coefficient between positive and negative samples can reach 81.62%.
出处
《北京工业大学学报》
CAS
CSCD
北大核心
2004年第4期432-436,共5页
Journal of Beijing University of Technology
基金
国家自然科学基金资助项目(60234020).
关键词
支持向量机
大肠杆菌启动子
识别
support vector machine
escherichia coli promoter
recognition