摘要
模体识别是运用计算机算法寻找一系列功能相近且形式相似的DNA序列片段,从而找出生物信息学中控制基因表达调控机制的转录因子结合位点,将这种问题转化为AP聚类算法可处理的模型,然后用AP聚类得到稳定的候选模体聚类,最终利用贪心算法对问题进行求精,得出一组候选模体集,利用相对熵测度对候选模体集合进行评价并且择优输出,从而构造出一种新的模体识别算法.实验结果分别从模拟数据和真实数据证明了所提算法的有效性.
Transcription factors can be combined with scription process. The special DNA sequence is called the special DNA sequence that can control gene tranthe motifs. The motif identification is to find a set of DNA fragments with both similar functions and similar forms. It plays a crucial role in the research on the structure and function of genes. The problem was converted to the model which can be processed by AP clustering algorithm. Then we get steady candidate motifs by using AP clustering. Finally we use the greedy algorithm to refine the clustering results. We can get a group of candidate motifs set, evaluate candidate motifs wet by information content and output the optimal motif set. Thereby the new algorithm is designed for the problem. The experimental results on both simulated data and real data demonstrate the validity of the proposed algorithm.
出处
《郑州大学学报(工学版)》
CAS
北大核心
2015年第3期110-114,共5页
Journal of Zhengzhou University(Engineering Science)
基金
中央高校基本科研项目(K50513100011)
关键词
基因转录
模体识别
AP聚类算法
gene transcription
motif identification
AP clustering algorithm