摘要
模体发现是生物信息学和计算机科学中最具挑战性的问题之一,对未经比对的DNA序列中定位转录因子结合位点起着重要的作用。将模体发现问题转化为无向图中求解最大团的问题,并提出了一种结合最大团求精的随机投影模体发现算法(MCR2PA)。与原有的投影算法相比,对于大多数模体发现问题,MCR2PA的识别准确率都有所提高。多组真实生物数据上的实验结果验证了所提算法的实用性。特别地,对于酿酒酵母数据,预测准确率能够达到80%以上。
Motif search is one of the most challenging problems in bioinformatics and computer science, playing an important role in locating transcription factor binding sites in unaligned DNA sequences. This paper converts motif search problem to finding maximum cliques in the undirected graph, and proposes a random projection motif search algorithm based on maximum clique re- finement, called MCR2PA. Compared with the original projection algorithm, MCR2PA achieves a better prediction accuracy on most motif search problems. The experimental results on multiple groups of real biological data demonstrate the practicability of the proposed algorithm; in particular, the prediction accuracy is higher than 80~ for the data of Saccharomyces cerevisiae.
出处
《中国科技论文》
CAS
北大核心
2013年第4期342-349,共8页
China Sciencepaper
基金
国家自然科学基金资助项目(61173025)
高等学校博士学科点专项科研基金资助项目(20100203110010)
中央高校基本科研业务费资助项目(K5051303002)
关键词
模体发现
转录因子结合位点
最大团
随机投影
motif search
transcription factor binding sites
maximum cliques
random projection