摘要
随着人类和其它一些模式生物基因组计划的实施与完成,DNA序列的数量呈指数方式增长,这使得序列分析成为生物信息学的核心问题之一。众所周知,序列之所以称为"序列",自然而然地包含了2个重要因素:元素及其之间的序关系。本文介绍一种全新的考虑元素之间序关系的方法,首先将传统集合上的排列的逆序数推广到多重集上,从而提出带有重复元素的排列的逆序数的概念;在此基础上通过将数字1、2、3、4分别赋值给4个碱基,将DNA原始序列转化为多重集上的数字排列,进而借助逆序数构造出DNA序列的24维向量表示;基于3个数据集的系统发生分析,证实了该方法的有效性。
With the completion/development of the genome projects of human and some model organism,the number of known DNA sequences is accumulated at an exponential rate with respect to time,which leads to the fact that the sequence analysis becomes one of the central topics ofbioinformatics.It is well known that a sequence naturally contains two important factors:the elements and their orders since it is called the "sequence".This paper introduces a new method of considering the position of the elements.We first present a definition of inverted sequence number of r-permutations over a multiset based on the traditional inverted sequence number.A number sequence is obtained by assigning 1,2,3,4 to the four bases,and then we construct a 24-dimensional vector by means of the inverted sequence number.The phylogenetic analysis of three data sets show that our method is efficient.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2014年第6期705-708,共4页
Computers and Applied Chemistry
基金
国家自然科学基金项目(11171042)
辽宁省高等学校杰出青年学者成长计划(LJQ2011122)
辽宁省"百千万人才工程"项目(2012921060)
关键词
多重集
逆序数
系统发生分析
multiset
inverted sequence number
phylogenetic analysis