Multiple comparisons among genomes can clarify their evolution, speciation, and functional innova- tions. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) c...Multiple comparisons among genomes can clarify their evolution, speciation, and functional innova- tions. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ~96 million years ago and could not be related to the Creta- ceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if poly- ploidization directly contributed to speciation. This work lays a solid foundation for Poaceae transla- tional genomics.展开更多
This article shows genomic alignment methods using the classic“Needleman”and“Smith-Waterman”algorithms,the latter they were optimized by the ABC(artificial bee colony)algorithm.In the genomic alignment,a goal stat...This article shows genomic alignment methods using the classic“Needleman”and“Smith-Waterman”algorithms,the latter they were optimized by the ABC(artificial bee colony)algorithm.In the genomic alignment,a goal state is not presented,the experiments that are carried out show alternative alignments by ABC were proposed.Different types of alignments could exist within the classical algorithm,based on a horizontal,vertical,diagonal and inverse search mechanism on a match value table.Our ABC-Smith Waterman algorithm was generated from the genomic sequences written in rows and columns for the search for similarities that will provide values that ABC uses to process and provide more results of alignments that can be used by scientists for their experiments and research.展开更多
Mutation (substitution, deletion, insertion, etc.) in nucleotide acid causes the maximal sequence lengths of exact match (MALE) between paralogous members from a duplicate event to become shorter during evolution. In ...Mutation (substitution, deletion, insertion, etc.) in nucleotide acid causes the maximal sequence lengths of exact match (MALE) between paralogous members from a duplicate event to become shorter during evolution. In this work, MALE changes between members of 26 gene families from four representative species (Arabidopsis thaliana, Oryza sativa, Mus mus- culus and Homo sapiens) were investigated. Comparative study of paralogous’ MALE and amino acid substitution rate (dA<0.5) indicated that a close relationship existed between them. The results suggested that MALE could be a sound evolutionary scale for the divergent time for paralogous genes during their early evolution. A reference table between MALE and divergent time for the four species was set up, which would be useful widely, for large-scale genome alignment and comparison. As an example, de- tection of large-scale duplication events of rice genome based on the table was illustrated.展开更多
DNA markers play important roles in plant breed- ing and genetics. The Insertion/Deletion (InDel) marker is one kind of co-dominant DNA markers widely used due to its low cost and high precision. However, the canoni...DNA markers play important roles in plant breed- ing and genetics. The Insertion/Deletion (InDel) marker is one kind of co-dominant DNA markers widely used due to its low cost and high precision. However, the canonical way of searching for InDel markers is time-consuming and labor- intensive. We developed an end-to-end computational solution (InDel Markers Development Platform, IMDP) to identify genome-wide InDel markers under a graphic pipeline environment. IMDP constitutes assembled genome sequen- ces alignment pipeline (AGA-pipe) and next-generation re- sequencing data mapping pipeline (NGS-pipe). With AGA-pipe we are able to identify 12,944 markers between the genome of rice cultivars Nipponbare and 93-11. Using NGS-pipe, we reported 34,794 InDels from re-sequencing data of rice cultivars Wu-Yun-Geng7 and Guang-Lu-Ai4. Combining AGA- pipe and NGS-pipe, we developed 2o5,659 InDels in eight japonica and nine indica cultivars and 2,681 InDels showed a subgroup-specific pattern. Polymerase chain reaction (PCR) analysis of subgroup-specific markers indicated that the precision reached 90% (86 of 95). Finally, to make them available to the public, we have integrated the InDels/markers information into a website (Rice InDel Marker Database, RIMD, http:I/2o2.12o.45.71/). The application of IMDP in rice will facilitate efficiency for development of genome-wide InDel markers, in addition it can be used in other species with reference genome sequences and NGS data.展开更多
Genomic sequence alignment is the most critical and time-consuming step in genomic analysis.Alignment algorithms generally follow a seed-and-extend model.Acceleration of the extension phase for sequence alignment has ...Genomic sequence alignment is the most critical and time-consuming step in genomic analysis.Alignment algorithms generally follow a seed-and-extend model.Acceleration of the extension phase for sequence alignment has been well explored in computing-centric architectures on field-programmable gate array(FPGA),application-specific integrated circuit(ASIC),and graphics processing unit(GPU)(e.g.,the Smith-Waterman algorithm).Compared with the extension phase,the seeding phase is more critical and essential.However,the seeding phase is bounded by memory,i.e.,fine-grained random memory access and limited parallelism on conventional system.In this paper,we argue that the processing-in-memory(PIM)concept could be a viable solution to address these problems.This paper describes\PIM-Align"|an application-driven near-data processing architecture for sequence alignment.In order to achieve memory-capacity proportional performance by taking advantage of 3D-stacked dynamic random access memory(DRAM)technology,we propose a lightweight message mechanism between different memory partitions,and a specialized hardware prefetcher for memory access patterns of sequence alignment.Our evaluation shows that the proposed architecture can achieve 20x and 1820x speedup when compared with the best available ASIC implementation and the software running on 32-thread CPU,respectively.展开更多
文摘Multiple comparisons among genomes can clarify their evolution, speciation, and functional innova- tions. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ~96 million years ago and could not be related to the Creta- ceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if poly- ploidization directly contributed to speciation. This work lays a solid foundation for Poaceae transla- tional genomics.
文摘This article shows genomic alignment methods using the classic“Needleman”and“Smith-Waterman”algorithms,the latter they were optimized by the ABC(artificial bee colony)algorithm.In the genomic alignment,a goal state is not presented,the experiments that are carried out show alternative alignments by ABC were proposed.Different types of alignments could exist within the classical algorithm,based on a horizontal,vertical,diagonal and inverse search mechanism on a match value table.Our ABC-Smith Waterman algorithm was generated from the genomic sequences written in rows and columns for the search for similarities that will provide values that ABC uses to process and provide more results of alignments that can be used by scientists for their experiments and research.
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 30270810, 90208022 and 30471067) and IBM Shared University Research (Life Science), China
文摘Mutation (substitution, deletion, insertion, etc.) in nucleotide acid causes the maximal sequence lengths of exact match (MALE) between paralogous members from a duplicate event to become shorter during evolution. In this work, MALE changes between members of 26 gene families from four representative species (Arabidopsis thaliana, Oryza sativa, Mus mus- culus and Homo sapiens) were investigated. Comparative study of paralogous’ MALE and amino acid substitution rate (dA<0.5) indicated that a close relationship existed between them. The results suggested that MALE could be a sound evolutionary scale for the divergent time for paralogous genes during their early evolution. A reference table between MALE and divergent time for the four species was set up, which would be useful widely, for large-scale genome alignment and comparison. As an example, de- tection of large-scale duplication events of rice genome based on the table was illustrated.
基金supported by the Funds from National Natural Science Foundation of China(31270222,31470397 and 31230051)Key Project on Basic Research from Science and Technology Commission of Shanghai(14JC1403900)+5 种基金Project on Breeding from Agriculture Commission of Shanghai(2013-13)the China Innovative Research Team,Ministry of Education,Chinathe 111 Project (B14016)the Innovation Program of Shanghai Municipal Education Commission(13ZZ018)the Innovation Program of Shanghai Pudong Science and Technology Commission (PKJ2013-N03)National Transgenic Major Program Grants 2014ZX08009-003-003
文摘DNA markers play important roles in plant breed- ing and genetics. The Insertion/Deletion (InDel) marker is one kind of co-dominant DNA markers widely used due to its low cost and high precision. However, the canonical way of searching for InDel markers is time-consuming and labor- intensive. We developed an end-to-end computational solution (InDel Markers Development Platform, IMDP) to identify genome-wide InDel markers under a graphic pipeline environment. IMDP constitutes assembled genome sequen- ces alignment pipeline (AGA-pipe) and next-generation re- sequencing data mapping pipeline (NGS-pipe). With AGA-pipe we are able to identify 12,944 markers between the genome of rice cultivars Nipponbare and 93-11. Using NGS-pipe, we reported 34,794 InDels from re-sequencing data of rice cultivars Wu-Yun-Geng7 and Guang-Lu-Ai4. Combining AGA- pipe and NGS-pipe, we developed 2o5,659 InDels in eight japonica and nine indica cultivars and 2,681 InDels showed a subgroup-specific pattern. Polymerase chain reaction (PCR) analysis of subgroup-specific markers indicated that the precision reached 90% (86 of 95). Finally, to make them available to the public, we have integrated the InDels/markers information into a website (Rice InDel Marker Database, RIMD, http:I/2o2.12o.45.71/). The application of IMDP in rice will facilitate efficiency for development of genome-wide InDel markers, in addition it can be used in other species with reference genome sequences and NGS data.
基金The National Key Research and Development Program of China under Grant Nos. 2018YFB0204400,2016YFB0201305, 2016YFB0200803, 2016YFB0200300, and XDC01030000the National Natural Science Foundation of China underGrant Nos. 6197237, and 61702483the CAS QYZDJ-SSW-JSC035 Funding.
文摘Genomic sequence alignment is the most critical and time-consuming step in genomic analysis.Alignment algorithms generally follow a seed-and-extend model.Acceleration of the extension phase for sequence alignment has been well explored in computing-centric architectures on field-programmable gate array(FPGA),application-specific integrated circuit(ASIC),and graphics processing unit(GPU)(e.g.,the Smith-Waterman algorithm).Compared with the extension phase,the seeding phase is more critical and essential.However,the seeding phase is bounded by memory,i.e.,fine-grained random memory access and limited parallelism on conventional system.In this paper,we argue that the processing-in-memory(PIM)concept could be a viable solution to address these problems.This paper describes\PIM-Align"|an application-driven near-data processing architecture for sequence alignment.In order to achieve memory-capacity proportional performance by taking advantage of 3D-stacked dynamic random access memory(DRAM)technology,we propose a lightweight message mechanism between different memory partitions,and a specialized hardware prefetcher for memory access patterns of sequence alignment.Our evaluation shows that the proposed architecture can achieve 20x and 1820x speedup when compared with the best available ASIC implementation and the software running on 32-thread CPU,respectively.