期刊文献+

基于平均交互信息量的DNA序列相似性分析 被引量:1

Similarity Analysis of DNA Sequences based on Average Mutual Information
下载PDF
导出
摘要 序列相似性分析是生物信息学中一个重要问题,对于研究物种的进化起源有着重要的意义。序列相似性算法包括基于序列比对的方法及非比对方法两种。基于比对的方法对于序列整体的衡量略有欠缺;非比对算法中有DNA曲线化方法以及比较序列各自整体碱基分布间的信息量差异的方法,只是考虑了序列整体信息间的差异,但未考虑序列各个位点间的差异。因此,提出了一种基于信息熵的相似性度量模型,把序列比对与信息量差异结合起来,将两条比对后的序列间的平均交互信息量与其联合熵之比作为两条序列的相似性度量。使用该度量构建了11个物种的相似性矩阵,对各物种间的相似性进行了分析,结果在一定程度上与生物分类学相契合。通过距离矩阵所构建的进化树,也反映了各物种间的进化关系,表明该模型的设计具有合理性。 Similarity analysis of DNA sequences is important in bioinformatics, it is of g~eat significance for evolutionary origin of species. There are two kinds of methods for similarity analysis of DNA sequences: alignment-based and alignment-free. Alignment-based methods have a slight lack of measuring the information content of a whole sequence, Alignment-free methods include methods of DNA curve and methods of comparing the information content ofthe whole sequence's bases distribution, these methods compare the information content of whoie sequence, but don't consider the difference of every sites. Therefore, this paper proposes a similarity metric model based on information entropy. Information content of sequence is combined with the result of alignment, the value of the average mutual information between two aligned sequences dividing their joint entropy is used as a measurement for their similarity. It is used to construct a similarity matrix of 11 species, analyze their similarity, and its result fits with the biological taxonomy in a certain extent. The phylogenetic tree which is constructed by distance matrix, also reflects the evolutionary relationship among the species, and indicates that the model designing is reasonable.
作者 詹青 王亚东 ZHAN Qing, WANG Yadong (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
出处 《智能计算机与应用》 2011年第2X期31-34,52,共5页 Intelligent Computer and Applications
关键词 生物信息学 DNA序列相似性 信息熵 平均交互信息量 进化树 Bioinformatics Similarity of DNA Sequences Information Entropy Average Mutual Information Phylogenetic Tree
  • 相关文献

参考文献12

  • 1YAO Y H,DAI Q,LI L,et al.Similarity/dissimilarity studiesof protein sequences based on a new 2D graphical representa-tion. Journal of Computational Chemistry . 2010
  • 2LIAO B,DING K.A 3D graphical representation of DNA se-quences and its application. Theoretical Computer Science . 2-006
  • 3LI C,MA H,ZHOU Y,et al.Similarity analysis of DNA se-quences based on the weighted pseudo-entropy. J ComputChem . 2011
  • 4Yao Y H,Dai Q,Nan X Y, et al.Analysis of similarity/dissimilarity of DNA sequences based on a class of 2D graphical representation. Journal of Computational Chemistry . 2008
  • 5B. Liao,M. S. Tang,K. Q. Ding,T. M. Wang."Analysis of similarity /dissimilarity of DNA sequences based on a condensed curve representation,". J. Mol. Struct. Theochem . 2005
  • 6Huang G H,Liao B,Li Y F, et al.Similarity studies of DNA sequences based on a new 2D graphical representation. Biophysical Chemistry . 2009
  • 7Needleman SB,Wunsch CD.A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology . 1970
  • 8Altschul SF,Gish W,Miller W,et al.Basic local alignment search tool. Journal of Molecular Biology . 1990
  • 9Hagenauer,Z.Dawy.Genomic Analysis using Methods from Information. Theory. Information Theory Workshop,IEEE . 2004
  • 10Randic M,Vracko M,Lers N,et al.Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. The Journal of Chemical Physics . 2003

同被引文献7

  • 1Zhu XY, Li KL, Salah A. A data parallel strategy for aligning multiple biological sequences on multi-core computers. Computers in Biology and Medicine, 2013.
  • 2Orobitg M, Cores F, Guirado F, Roig C, Notredame C. Improving multiple sequence alignment biological accuracy through genetic algorithms. The Journal of Supercomputing, 2013, 653.
  • 3Flouri T, Frousios K, Iliopoulos CS, Park K, Pissis SP, Tischler G. GapMis: a tool for pairwise sequence alignment with a single gap. Recent Patents on DNA & Gene Sequences, 2013, 72.
  • 4Othman MTB, Abdel-Azim G. Genetic algorithms with permutation coding for multiple sequence alignment. Recent Patents on DNA & Gene Sequences, 2013, 72.
  • 5Jose L, Daniel R, Anuj S, Eric K, Zhang JF. RNA global alignment in the joint sequence-structure space using elastic shape analysis. Nucleic Acids Research, 2013, 4111.
  • 6吕品一,郑珩,劳兴珍.蛋白质共进化分析研究进展[J].生物信息学,2010,8(1):34-37. 被引量:3
  • 7马海晨,韦刚,吴百峰.基于GPGPU的生物序列快速比对[J].计算机工程,2012,38(4):241-244. 被引量:5

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部