摘要
根据已有的启动子识别算法,提出了一种基于滑动窗口的大肠杆菌转录起始位点(TSS)计算定位方法,通过在启动子信号特征中引入复合模式来改进识别分类器,并将其用于滑动窗口序列,在合理限定的TSS定位范围内依次计算各个序列位置的TSS似然得分,再利用TSS与翻译起始位点(TLS)的距离分布信息作为TSS的位置得分,两者相结合来进行位置预测。对大肠杆菌真实数据的测试表明,算法可以大幅度减少假阳性结果,实现对真实TSS位置的有效预测。
Although a large number of researches have been undertaken in the area of transcription start site (TSS) localization, the problem of TSS localization has not yet been fully resolved. According to the previous promoter prediction algorithm, a new sliding window based computational localization method for E. coli TSSs is proposed. The TSS-likelihood scores of each possible position in genomic sequences are calculated by the window classifier which is improved by introducing the composite motif model in the training procedure of original promoter classifier. The distribution of distances between TSSs and translation start sites (TLSs) is also utilized to calculate the TSS-position scores. Localization results are achieved from the final score profiles which combine TSS-likelihood scores and TSS-position scores. The test results on E. coli dataset show that the method can find the putative TSSs and decrease the number of false positives efficiently.
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2006年第4期88-92,共5页
Journal of National University of Defense Technology
基金
国家自然科学基金资助项目(60471003)
关键词
大肠杆菌
转录起始位点
计算定位
复合模式
滑动窗口
escherichia coli
transcription start site (TSS)
computational localization
composite motif
sliding window