摘要
To date,comparing and visualizing genome sequences remain challenging due to the large genome size.Existing approaches take advantage of the stable property of oligonucleotides and exhibit the main characteristics of the whole genome,yet they commonly fail to show progression patterns of the genome adjustably.This paper presents a novel visual encoding technique,which not only supports the binning process (phylogenetic analysis),but also allows the sequential analysis of the genome.The key idea is to regard the combination of each k-nucleotide and its reverse complement as a visual word,and to represent a long genome sequence with a list of local statistical feature vectors derived from the local frequency of the visual words.Experimental results on a variety of examples demonstrate that the presented approach has the ability to quickly and intuitively visualize DNA sequences,and to help the user identify regions of differences among multiple datasets.
To date, comparing and visualizing genome sequences remain challenging due to the large genome size. Existing approaches take advantage of the stable property of oligonucleotides and exhibit the main characteristics of the whole genome, yet they commonly fail to show progression patterns of the genome adjustably. This paper presents a novel visual encoding technique, which not only supports the binning process (phylogenetic analysis), but also allows the sequential analysis of the genome. The key idea is to regard the combination of each k-nucleotide and its reverse complement as a visual word, and to represent a long genome sequence with a list of local statistical feature vectors derived from the local frequency of the visual words. Experimental results on a variety of examples demonstrate that the presented approach has the ability to quickly and intuitively visualize DNA sequences, and to help the user identify regions of differences among multiple datasets.
基金
supported by the National Natural Science Foundation of China (Nos.60873123 and 60903085)
the National Basic Research Program (973) of China (No.2010CB732504)
the Natural Science Foundation of Zhejiang Province (No.Y1080618)
the Open Project Program of the State Key Lab of CAD & CG,Zhejiang University,China (No.A0905)