摘要
寻找物种基因组中k-mer频数分布的特征,对大肠杆菌、枯草杆菌、甲烷球菌、疟原虫(部分)等四个典型物种的基因组全序列进行了统计分析.引入"字"域、"频数"域的信息熵,并研究这两种信息熵与变量k之间的关系.发现它们之间具有很好的线性关系,并且"频数"域的信息熵与k的线性关系在物种间具有普适性.
To find the character of the distribution of k-mer frequency in genomes,four typical species—Escherichia coli,Bacillus subtilis,Methanococcus jannaschii and Plasmodium falciparum are statisticallies analyzed.Two types of information-entropies in word domain and frequency domain are respectively defined,and the relation between information-entropy and window-size k is analyzed.It is found that a good linear relation exists between the information-entropy in word or frequency domain and k.Moreover, the relation is universal among studied species.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2005年第3期301-305,共5页
Journal of Inner Mongolia University:Natural Science Edition
基金
国家自然基金项目(90103030)