We discovered 528 putative cytochrome P450s (P450s) in Oryza sativa L. ssp. indica using Arabidopsis thaliana P450s as database. Those putative rice P450s are thought to belong to 40 families classified in Arabidopsis...We discovered 528 putative cytochrome P450s (P450s) in Oryza sativa L. ssp. indica using Arabidopsis thaliana P450s as database. Those putative rice P450s are thought to belong to 40 families classified in Arabidopsis thaliana. We compared distributions of Arabidopsis thaliana and Oryza sativa P450s and found the two species have similar distribution patterns. However, family distributions of two species also have some differences. For example, in rice, the gene number in families of CYP71, CYP72, CYP76, CYP89, CYP94 and CYP709 is more than twice that in Arabidopsis thaliana; and there are 33 CYP705 members in Arabidopsis thaliana but none in rice. We also found gene members in CYP71 and CYP81 are organized as tandem arrays repeated in the rice genome; maybe they are duplications in the evolutionary event. Furthermore, we accumulated expression sequence tag (EST) evidence for 263 putative rice P450s, which are expressed at transcriptional level and more likely to be true P450s.展开更多
The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats an...The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affects the accuracy of repeat assembly and scaffold construction. We also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.展开更多
文摘We discovered 528 putative cytochrome P450s (P450s) in Oryza sativa L. ssp. indica using Arabidopsis thaliana P450s as database. Those putative rice P450s are thought to belong to 40 families classified in Arabidopsis thaliana. We compared distributions of Arabidopsis thaliana and Oryza sativa P450s and found the two species have similar distribution patterns. However, family distributions of two species also have some differences. For example, in rice, the gene number in families of CYP71, CYP72, CYP76, CYP89, CYP94 and CYP709 is more than twice that in Arabidopsis thaliana; and there are 33 CYP705 members in Arabidopsis thaliana but none in rice. We also found gene members in CYP71 and CYP81 are organized as tandem arrays repeated in the rice genome; maybe they are duplications in the evolutionary event. Furthermore, we accumulated expression sequence tag (EST) evidence for 263 putative rice P450s, which are expressed at transcriptional level and more likely to be true P450s.
文摘The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affects the accuracy of repeat assembly and scaffold construction. We also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.