基于深度优先判定聚类的DNA序列模体发现

DNA Sequences Motif Discovery Based on DFD Clustering

导出

摘要提出一种数据挖掘方法 MMHC来求解DNA序列模体。首先使用基于种子的错配聚类形成候选模体类,然后使用基于相对熵及聚类复杂度的深度优先判定(depth first determination,DFD)算法识别真正的模体类,最后使用保守区扫描法(conservation region scanning,CRS)及最大后验概率保值过滤法(MAP value-preservation filtering,MVPF)优化模体类。在两类DNA序列数据集上,将MMHC与三种经典的模体发现方法 MEME、AlignACE和SOMBRERO进行了对比试验。结果表明:对于大多数数据集,MMHC方法无论是在发现模体的可靠性及准确性方面,还是在反映背景种类的聚类结构方面,都明显优于三种经典的模体发现方法。 A data mining method MMHC was given to solve DNA sequences motifs.The seed-based mismatch clustering was used to form the candidate motif clusters.Then the depth first determination（DFD） algorithm based on relative entropy and cluster complexity was proposed to identify the true motif clusters.Finally,the conservation region scanning（CRS） and MAP value-preservation filtering（MVPF） were given to optimize motif clusters.The experiment was conducted by testing MMHC method and comparing its performance with other three classic motif discovery methods MEME,AlignACE and SOMBRERO on two classes of DNA sequences datasets.Experimental results show the superiority of MMHC method over the three classic motif discovery methods in reliability,precision and the reflection of the cluster structure of the background species for most of the DNA sequences datasets.

作者何红洲周明天

机构地区电子科技大学计算机科学与工程学院绵阳师范学院数学与计算机科学学院

出处《生物物理学报》 CAS CSCD 北大核心 2013年第5期384-394,共11页 Acta Biophysica Sinica

基金四川省教育厅自然科学研究项目(12ZB070)~~

关键词模体发现聚类分析深度优先判定保守区扫描 Motif discovery Clustering analysis Depth first determination Conservation region scanning

分类号 Q523 [生物学—生物化学]

引文网络
相关文献

参考文献2

1陈鸣,薛慧君,熊赟,朱扬勇.基于多数据域描述的转录因子结合位点识别[J].计算机应用与软件,2011,28(6):1-4. 被引量：1
2武学鸿,费耀平,李敏.蛋白质网络聚类算法分析平台的设计与实现[J].生物信息学,2012,10(2):106-111. 被引量：1

二级参考文献41

1Asur S, Ucar D, Parthasarathy S. An ensemble framework for clus- tering protein - protein interaction networks [ J ]. Bioinformatics, 2007, 23 ( 13 ) : i29 - i40.
2Enright A J, Dongen S V, Ouzounis C A. An efficient algorithm for large - scale detection of protein families [ J ]. Nucleic Acids Re- search, 2002, 30(7): 1575- 1584.
3Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software en- vironment for integrated models of biomolecular interaction networks [ J]. Genome Research,2003,13 ( 11 ) :2498 - 504.
4Adamcsek B, Palla G, Farkas I, et al. CFinder: locating cliques and overlapping modules in biological networks [ J ]. Bioinformat- ics,2006, 22(8) : 1021 - 1023.
5Mewes H W, Amid C, Arnold R, et al. MIPS: analysis and annota- tion of proteins from whole genomes[ J]. Nucleic Acids Research , 2004, 32:I341 -D44.
6Issel - Tarver L, Christie K R, Dolinski K, et al : Saccharomyces Genome Database [ J ]. Methods Enzymol,2002 ,350 :329 - 346.
7Breitkreutz B J, Stark C, Tyers M. The GRID: the General Res- pository for Interaction Datasets [ J ]. Genome Biology, 2003, 4 (3) :R23.
8Min Li,Jianxin Wang and Jian' er Chen. A Fast Agglomerate Algo- rithm for Mining Functional Modules in Protein Interaction Net- works. Proceedings of the 2008 International Conference on Bio - Medical Engineering and Informatics [ J]. IEEE press ,2008,3 -7.
9Altaf- UI - Amin M, Shinbo Y, Mihara K, et al. Developmentand implementation of an algorithm for detection of protein comple- xes in large interaction networks [ J ]. BMC Bioinformatics, 2006, 7:207.
10Luo F, Yang Y, Chen C F, et al. Modular organization of protein interaction networks [ J ]. Bioinformatics, 2007, 23 ( 2 ) : 207 - 214.

1朱清新,杨凡.生物序列模体发现的最优化模型[J].成都大学学报（自然科学版）,2008,27(1):1-4.
2李伟,吴广畏,杨玉萍,尹文兵.后基因组时代的真菌天然产物发现[J].菌物学报,2015,34(5):914-926. 被引量：9
3木妮娜.玉素甫,古丽娜.玉素甫.有效的Common Motif识别算法[J].电脑知识与技术（过刊）,2016,22(4X):164-168.
4黄海龙,王哲,于国健,孙佳,贺鹏飞,吴永进.牛脂联素基因的克隆及在毕赤酵母中的表达[J].中国兽医学报,2009,29(9):1197-1200.
5PALEOBOTANY[J].Abstracts of Chinese Geological Literature,2012,28(2):89-91.
6MICROPALAE ONTOLOGY[J].Abstracts of Chinese Geological Literature,2012,28(3):88-88.
7蒋经伟,董颖,周遵春.海洋无脊椎动物漆酶型酚氧化酶研究概况[J].中国农业科技导报,2015,17(3):167-174. 被引量：2
8王晓冰.生命银行中的两种“货币”[J].百科知识,2009(17):42-43.
9徐存拴,闫春玲,江云,周运,黄荧,李钧涛.建立同类提取法从基因芯片检测数据中挖掘大鼠肝细胞的肝再生关键基因[J].河南科学,2013,31(6):762-767. 被引量：1
10Li Weidong,Li Chunsheng,Zheng Honggang,Chen Guohong,Hua Baojin.Therapeutic targets of Traditional Chinese Medicine for colorectal cancer[J].Journal of Traditional Chinese Medicine,2016,36(2):243-249. 被引量：18

生物物理学报

2013年第5期

浏览历史

内容加载中请稍等...

基于深度优先判定聚类的DNA序列模体发现

参考文献2

二级参考文献41

相关作者

相关机构

相关主题

浏览历史