摘要
It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals.In this paper,a statistical model of splicing signals was built based on the entropy density profile(EDP) method,weight array method(WAM) and κ test;moreover,the model of splicing regulatory elements was developed by an unsupervised self-learning method to detect motifs associated with regulatory elements.With two models incorporated,a multi-level support vector machine(SVM) system was de-vised to perform ab initio prediction for splice sites originating from DNA sequence in eukaryotic ge-nome.Results of large scale tests on human genomic splice sites show that the new method achieves a comparative high performance in splice site prediction.The method is demonstrated to be with at least the same level of performance and usually better performance than the existing SpliceScan method based on modeling regulatory elements,and shown to have higher accuracies than the traditional methods with modeling splicing signals such as the GeneSplicer.In particular,the method has evident advantage over splice site prediction for the genes with lower GC content.
It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals. In this paper, a statistical model of splicing signals was built based on the entropy density profile (EDP) method, weight array method (WAM) and K test; moreover, the model of splicing regulatory elements was developed by an unsupervised self-learning method to detect motifs associated with regulatory elements. With two models incorporated, a multi-level support vector machine (SVM) system was devised to perform ab initio prediction for splice sites originating from DNA sequence in eukaryotic genome. Results of large scale tests on human genomic splice sites show that the new method achieves a comparative high performance in splice site prediction. The method is demonstrated to be with at least the same level of performance and usually better performance than the existing SpliceScan method based on modeling regulatory elements, and shown to have higher accuracies than the traditional methods with modeling splicing signals such as the GeneSplicer. In particular, the method has evident advantage over splice site prediction for the genes with lower GC content.
基金
the State Basic Research Program of China (Grant No. 2003CB715905)
National Nature Science Foundation of China (Grant Nos. 30300071, 30770499 and 10721403)
Youth Foundation of College of Engineering of Peking University
关键词
基因预报
结合位置
结合信号
有限元分析
gene prediction
splice site
splicing signal
regulatory element