摘要
基因组序列k-mer的非随机使用规律及包含的生物学意义一直是人们关注的问题,目前还没有根本性进展。本文以七个物种的全部基因序列为样本,得到各物种基因组序列的8-mer频谱分布。发现狗和牛的频谱有三个峰,而斑马鱼、青鳉鱼、秀丽线虫和酿酒酵母的频谱只有一个峰,鸡的频谱分布形状介于两者之间。将8-mer集合按照XY二核苷含量分类,结果显示只有CG二核苷分类下0CG、1CG和2CG三类子集的频谱形成各自独立的单峰分布。对照随机序列,发现0CG模体是随机进化的,1CG和2CG模体是定向进化的,它们的使用频次远小于随机频次,且这种独立进化分离规律具有物种普适性。三个CG子集频谱之间的距离是产生单峰或多峰现象的根本原因。将七个物种基因组序列标准化到109bp,比较发现1CG和2CG子集频谱与物种进化显著相关,0CG子集频谱与物种进化无显著关系。可以认为三种CG模体各自执行着不同的生物学功能。基因组序列8-mer的独立分离规律为揭示基因组结构、基因组进化以及模体的生物功能提供了一种新的思维方式。
The rules of k-mer non-random usage in genome sequences and its biological significance are important problems and its mechanism is still not clear. Based on seven genome sequences,the distributions of 8-mer frequency spectra were gotten. Results show that 8-mer spectra of dog and cow are trimodal and of zebra fish,medaka,nematode and yeast are unimodal. For chicken genome,the 8-mer spectrum is a medium between the two models. When the 8-mer set were classified into three subsets according to XY dinucleotide content,results show that only if in CG dinucleotide classification,the 0CG,1CG and 2CG subsets form independent and unimodal distributions respectively. Compared with random sequences,it is found that 0CG motifs are the result of the random evolution,1CG / 2CG motifs are the result of the directed evolution and their frequencies are far low from the random frequencies. The rules of independent separation for the three CG subsets have species universality. Results indicate that the prime reasons about unimdals or multimodals of 8-mer spectra in different species are the distance differences of the three CG spectra. When seven genome sequences are normalized into 109 bp,results show that the spectra of 1CG and 2CG motifs are correlated significantly with genome evolution and of 0CG motifs has not obvious relation to genome evolution. We think that the three CG motifs have different biological functions. The rules of independent separation for the three CG subsets will provide a novel idea to research genome structures and evolutions and provide a method to reveal the functional elements in genome sequences.
出处
《生物信息学》
2016年第4期195-202,共8页
Chinese Journal of Bioinformatics
基金
国家自然科学基金项目(No.31260219)
国家级大学生创新训练计划项目(No.201512149)