摘要
目的优选对人类RNA聚合酶(Pol)Ⅱ启动子数据识别分类并提高识别准确率的方法。方法采用基于知识的统计编码方法、CpG编码、五联体(Pentamers)编码、模式字典(Pattern Dictionary)编码,最后建立共识模型,使用支持向量机(SVM)方法对启动子数据进行分类。结果启动子数据编码后在SVM中识别,与其他利用SVM工具相比,得到了较高的识别准确率、特异性及灵敏度。将新编码方法应用到人类22号染色体启动子数据的识别中,其中模式字典编码识别准确率达到了90.98%。结论共识模型考虑了各子模型的独立性和模型之间的差异性,发挥了各模型之间的互补优势,从而提高了最终的识别准确率。
Objective To recognize human PolⅡpromoter, and select a better coding method with highly promoted recognition precision. Methods Novel encoding methods were applied to encoding of the human promoter sequences, including statistical code, CpG code, Pentamers code, and Pattern Dictionary code, fight consensus models were built up, and the promoter se- quences with the Support Vector Machine (SVM) were recognized. Results The recognition accuracy, sensitivities and specificities had precedence. The accuracy of the human chromo- some 22 promoter recognition reached 90.98%. Conclusion The consensus models include the independence and difference of each sub-models, and exert the superiorities and the com- plementarities of the sub-models.
出处
《哈尔滨医科大学学报》
CAS
北大核心
2012年第1期23-26,共4页
Journal of Harbin Medical University