A general and flexible multi-motif model is proposed based on dynamic programming. By extending theGibbs sampler to the dynamic programming and introducing temperature, an efficient algorithm is developed. Branchpoint...A general and flexible multi-motif model is proposed based on dynamic programming. By extending theGibbs sampler to the dynamic programming and introducing temperature, an efficient algorithm is developed. Branchpoint signalsequences and translation initiation sequences extracted from the rice genome are then examined.展开更多
Human leukocyte antigen (HLA) system is the most polymorphic region known in the human genome. In the present study, we analyzed for the first time the HLA-A gene polymorphisms defined by the high-resolution typing me...Human leukocyte antigen (HLA) system is the most polymorphic region known in the human genome. In the present study, we analyzed for the first time the HLA-A gene polymorphisms defined by the high-resolution typing methods-sequence-based typing (SBT) in 161 Northern Chinese Han people. A total of 74 different HLA-A gene types and 36 alleles were detected. The most frequent alleles were A*110101 (GP=0.2360), A*24020101 (GF=0.1646), and A*020101 (GF=0.1553); followed by A*3303 (GF=0.1180), A*3001 (GF=0.0590), and A*310102 (GF=0.0404). The frequencies of following alleles, A*0203, A*0205, A*0206, A*0207, A*030101, A*2423, A*2601, A*3201, and A*3301, are all higher than 0.0093. The homozygous alleles include A*020101, A*110101, A*24020101 and A*310102. Heterozygosity (H), polymorphism information content (PIC), discrimination power (DP) and probability of paternity exclusion (PPE) of HLA-A in the samples were calculated and their values were 0.8705, 0.8491, 0.6014, and 0.9475, respectively. These results by SBT analysis of HLA-A polymorphism in Northern Chinese Han population, especially the allele subtypes character, will be of great interest for clinical transplantation, disease-associated study and forensic identification. Implementation of high-resolution typing methods allows a significantly wider spectrum of HLA variation including rare alleles. This spectrum will further be extensively utilized in many fields.展开更多
We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an ext...We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an extra 29-nucleotide sequence located at the nucleotide positions between 27,863 and 27,864 (referred to the complete sequence of BJ01) within an overlapped region composed of BGI-PUP5 (BGI-postulated uncharacterized protein 5) and BGI-PUP6 upstream of the N (nucleocapsid) protein. The discovery of this minor genotype, GD-Ins29, suggests a significant genetic event and differentiates it from the previously reported genotype, the dominant form among all sequenced SARS-CoV isolates. A 17-nt segment of this extra sequence is identical to a segment of the same size in two human mRNA sequences that may interfere with viral genome replication and transcription in the cytosol of the infected cells. It provides a new avenue for the exploration of the virus-host interaction in viral evolution, host pathogenesis, and vaccine development.展开更多
The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino a...The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino acids, in the viral genome. Its structure can be divided into three regions: a long N-terminal region in the exterior, a characteristic transmembrane (TM) region, and a short C-terminus in the interior of a virion. We detected fifteen substitutions of nucleotides by comparisons with the seventeen published SARS-CoV genome sequences, eight (53.3%) of which are non-synonymous mutations leading to amino acid alternations with predicted physiochemical changes. The possible antigenic determinants of the S protein are predicted, and the result is confirmed by ELISA (enzyme-linked immunosorbent assay) with synthesized peptides. Another profound finding is that three disulfide bonds are defined at the C-terminus with the N-terminus of the E (envelope) protein, based on the typical sequence and positions, thus establishing the structural connection with these two important structural proteins, if confirmed. Phyloge-netic analysis reveals several conserved regions that might be potent drug targets.展开更多
We discovered 528 putative cytochrome P450s (P450s) in Oryza sativa L. ssp. indica using Arabidopsis thaliana P450s as database. Those putative rice P450s are thought to belong to 40 families classified in Arabidopsis...We discovered 528 putative cytochrome P450s (P450s) in Oryza sativa L. ssp. indica using Arabidopsis thaliana P450s as database. Those putative rice P450s are thought to belong to 40 families classified in Arabidopsis thaliana. We compared distributions of Arabidopsis thaliana and Oryza sativa P450s and found the two species have similar distribution patterns. However, family distributions of two species also have some differences. For example, in rice, the gene number in families of CYP71, CYP72, CYP76, CYP89, CYP94 and CYP709 is more than twice that in Arabidopsis thaliana; and there are 33 CYP705 members in Arabidopsis thaliana but none in rice. We also found gene members in CYP71 and CYP81 are organized as tandem arrays repeated in the rice genome; maybe they are duplications in the evolutionary event. Furthermore, we accumulated expression sequence tag (EST) evidence for 263 putative rice P450s, which are expressed at transcriptional level and more likely to be true P450s.展开更多
The Coronaviridae family is characterized by a nucleocapsid that is composed of the genome RNA molecule in combination with the nucleoprotein (N protein) within a virion. The most striking physiochemical feature of th...The Coronaviridae family is characterized by a nucleocapsid that is composed of the genome RNA molecule in combination with the nucleoprotein (N protein) within a virion. The most striking physiochemical feature of the N protein of SARS-CoV is that it is a typical basic protein with a high predicted pI and high hydrophilicity, which is consistent with its function of binding to the ribophosphate backbone of the RNA molecule. The predicted high extent of phosphorylation of the N protein on multiple candidate phosphorylation sites demonstrates that it would be related to important functions, such as RNA-binding and localization to the nucleolus of host cells. Subsequent study shows that there is an SR-rich region in the N protein and this region might be involved in the protein-protein interaction. The abundant antigenic sites predicted in the N protein, as well as experimental evidence with synthesized polypeptides, indicate that the N protein is one of the major antigens of the SARS-CoV. Compared with other viral structural proteins, the low variation rate of the N protein with regards to its size suggests its importance to the survival of the virus.展开更多
In order to develop clinical diagnostic tools for rapid detection of SARS-CoV (severe acute respiratory syndrome-associated coronavirus) and to identify candidate proteins for vaccine development, the C-terminal porti...In order to develop clinical diagnostic tools for rapid detection of SARS-CoV (severe acute respiratory syndrome-associated coronavirus) and to identify candidate proteins for vaccine development, the C-terminal portion of the nucleocapsid (NC) gene was amplified using RT-PCR from the SARS-CoV genome, cloned into a yeast expression vector (pEGH), and expressed as a glutathione S-transferase (GST) and Hisx6 double-tagged fusion protein under the control of an inducible promoter. Western analysis on the purified protein confirmed the expression and purification of the NC fusion proteins from yeast. To determine its antigenicity, the fusion protein was challenged with serum samples from SARS patients and normal controls. The NC fusion protein demonstrated high antigenicity with high specificity, and therefore, it should have great potential in designing clinical diagnostic tools and provide useful information for vaccine development.展开更多
The porcine major histocompatibility complex (MHC, also named swine leukocyte antigen, SLA) is associ- ated not only with immune responsibility and disease suscep- tibility, but also with some reproductive and product...The porcine major histocompatibility complex (MHC, also named swine leukocyte antigen, SLA) is associ- ated not only with immune responsibility and disease suscep- tibility, but also with some reproductive and productive traits such as growth rate and carcass composition. As yet system- atical research on SLA expression profile is not reported. In order to illustrate SLA expression comprehensively and deepen our understanding of its function, we outlined the expression profile of SLA in 51 tissues of Landrace by ana- lyzing a large amount of ESTs produced by “Sino-Danish Porcine Genome Project”. In addition, we also compared the expression profile of SLA in several tissues from different development stages and from another breed (Erhualian). The result shows: (i) classical SLA genes are highly expressed in immune tissues and middle part of intestine; (ii) although SLA-3 is an SLA Ia gene, its expression abundance and pat- tern are quite different from those of the other two SLA Ia genes. The same phenomenon is seen in HLA-C expression, suggesting that the two genes may function similarly and undergo convergent evolution; (iii) except in jejunum, the antigen presenting genes are more highly expressed in breed Erhualian than in Landrace. The difference might associate with the higher resistance to bad conditions (including pathogens) of Erhualian and higher growth rates of Land- race.展开更多
For some historic reasons, our new journal is named 'Genomics, Pro teomics & Bioinformatics', or as we have nicknamed it in short the Journal of GPB. A growing number of '-ome' and '-omics'...For some historic reasons, our new journal is named 'Genomics, Pro teomics & Bioinformatics', or as we have nicknamed it in short the Journal of GPB. A growing number of '-ome' and '-omics' have appeared in many diverse fields of biology, especially in the recent years under profound influences of the Human Genome Project and many other genome projects completed or in progress. We had almost attempted to re-name this journal 'Ever-more-omics' to include all the new comers. However, after a second thought, we have decided to entertain these 'Three Kingdoms' first while we are keeping an eye on others.展开更多
We recently reported the use of a gene-trapping approach to isolate cell clones in which a reporter gene had integrated into genes modulated by T-cell activation. We have now tested a panel of clones from that report ...We recently reported the use of a gene-trapping approach to isolate cell clones in which a reporter gene had integrated into genes modulated by T-cell activation. We have now tested a panel of clones from that report and identified the one that responds to a variety of G-protein coupled receptors (GPCR). The β-lactamase tagged EGR-3 Jurkat cell was used to dissect specific GPCR signaling in vivo. Three GPCRs were studied, including the chemokine receptor CXCR4 (Gi-coupled) that was endogenously expressed, the platelet activation factor (PAF) receptor (Gq-coupled), and B2 adrenergic receptor (Gs-coupled) that was both stably transfected. Agonists for each receptor activated transcription of the β-lactamase tagged EGR-3 gene. Induction of EGR-3 through CXCR4 was blocked by pertussis toxin and PD58059, a specific inhibitor of MEK (MAPK/ERK kinase). Neither of these inhibitors blocked isoproterenol or PAF-mediated activation of EGR-3. Conversely,β2- and PAF-mediated EGR-3 activation was blocked by the p38, specific inhibitor SB580. In addition, both β2- and PAF-mediated EGR-3 activation could be synergistically activated by CXCR4 activation. This combined result indicates that EGR-3 can be activated through distinct signal transduction pathways by different GPCRs and that signals can be integrated and amplified to efficiently tune the level of activation.展开更多
The E (envelope) protein is the smallest structural protein in all coronaviruses and is the only viral structural protein in which no variation has been detected. We conducted genome sequencing and phylogenetic analys...The E (envelope) protein is the smallest structural protein in all coronaviruses and is the only viral structural protein in which no variation has been detected. We conducted genome sequencing and phylogenetic analyses of SARS-CoV. Based on genome sequencing, we predicted the E protein is a transmembrane (TM) protein characterized by a TM region with strong hydrophobicity and α-helix conformation. We identified a segment (NH2-_L-Cys-A-Y-Cys-Cys-N_-COOH) in the carboxyl-terminal region of the E protein that appears to form three disulfide bonds with another segment of corresponding cysteines in the carboxyl-terminus of the S (spike) protein. These bonds point to a possible structural association between the E and S proteins. Our phylogenetic analyses of the E protein sequences in all published coronaviruses place SARS-CoV in an independent group in Coronaviridae and suggest a non-human animal origin.展开更多
Expressed Sequence Tag (EST) analysis has pioneered genome-wide gene discovery and expression profiling. In order to establish a gene expression index in the rice cultivar indica, we sequenced and analyzed 86,136 ESTs...Expressed Sequence Tag (EST) analysis has pioneered genome-wide gene discovery and expression profiling. In order to establish a gene expression index in the rice cultivar indica, we sequenced and analyzed 86,136 ESTs from nine rice cDNA libraries from the super hybrid cultivar LYP9 and its parental cultivars. We assembled these ESTs into 13,232 contigs and leave 8,976 singletons. Overall, 7,497 sequences were found similar to the existing sequences in GenBank and 14,711 are novel. These sequences are classified by molecular function, biological process and pathways according to the Gene Ontology. We compared our sequenced ESTs with the publicly available 95,000 ESTs from japonica, and found little sequence variation, despite the large difference between genome sequences. We then assembled the combined 173,000 rice ESTs for further analysis. Using the pooled ESTs, we compared gene expression in metabolism pathway between rice and Arabidopsis according to KEGG. We further profiled gene expression patterns in different tissues, developmental stages, and in a conditional sterile mutant, after checking the libraries are comparable by means of sequence coverage. We also identified some possible library specific genes and a number of enzymes and transcription factors that contribute to rice development.展开更多
We studied structural and immunological properties of the SARS-CoV M (membrane) protein, based on comparative analyses of sequence features, phylogenetic investigation, and experimental results. The M protein is predi...We studied structural and immunological properties of the SARS-CoV M (membrane) protein, based on comparative analyses of sequence features, phylogenetic investigation, and experimental results. The M protein is predicted to contain a triple-spanning transmembrane (TM) region, a single N-glycosylation site near its N-terminus that is in the exterior of the virion, and a long C-terminal region in the interior. The M protein harbors a higher substitution rate (0.6% correlated to its size) among viral open reading frames (ORFs) from published data. The four substitutions detected in the M protein, which cause non-synonymous changes, can be classified into three types. One of them results in changes of pI (isoelectric point) and charge, affecting antigenicity. The second changes hydrophobicity of the TM region, and the third one relates to hydrophilicity of the interior structure. Phylogenetic tree building based on the variations of the M protein appears to support the non-human origin of SARS-CoV. To investigate its immunogenicity, we synthesized eight oligopeptides covering 69.2% of the entire ORF and screened them by using ELISA (enzyme-linked immunosorbent assay) with sera from SARS patients. The results confirmed our predictions on antigenic sites.展开更多
The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats an...The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affects the accuracy of repeat assembly and scaffold construction. We also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.展开更多
Since pig is an important livestock species worldwide, its gene expressionhas been investigated intensively, but rarely in brain. In order to study gene expression profilesin the pig central nervous system, we sequenc...Since pig is an important livestock species worldwide, its gene expressionhas been investigated intensively, but rarely in brain. In order to study gene expression profilesin the pig central nervous system, we sequenced and analyzed 43,122 high-quality 5'' end expressedsequence tags (ESTs) from porcine cerebellum, cortex cerebrum, and brain stem cDNA libraries,involving several different prenatal and postnatal developmental stages. The initial ESTs wereassembled into 16,101 clusters and compared to protein and nucleic acid databases in GenBank. Ofthese sequences, 30.6% clusters matched protein databases and represented function known sequences;75.1% had significant hits to nucleic acid databases and partial represented known function; 73.3%matched known porcine ESTs; and 21.5% had no matches to any known sequences in GenBank. We used thecategories defined by the Gene Ontology to survey gene expression in the porcine brain.展开更多
To obtain an initial overview of gene diversity and expression pattern in porcine thymus, 11,712 ESTs (Expressed Sequence Tags) from 100-day-old porcine thymus (FTY) were sequenced and 7,071 cleaned ESTs were used for...To obtain an initial overview of gene diversity and expression pattern in porcine thymus, 11,712 ESTs (Expressed Sequence Tags) from 100-day-old porcine thymus (FTY) were sequenced and 7,071 cleaned ESTs were used for gene expression analysis. Clustered by the PHRAP program, 959 contigs and 3,074 singlets were obtained. Blast search showed that 806 contigs and 1,669 singlets (totally 5,442 ESTs) had homologues in GenBank and 1,629 ESTs were novel. According to the Gene Ontology classification, 36.99% ESTs were cataloged into the gene expression group, indicating that although the functional gene (18.78% in defense group) of thymus is expressed in a certain degree, the 100-day-old porcine thymus still exists in a developmental stage. Comparative analysis showed that the gene expression pattern of the 100-day-old porcine thymus is similar to that of the human infant thymus.展开更多
Beijing has been one of the epicenters attacked most severely by the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) since the first patient was diagnosed in one of the city's hospitals. We now...Beijing has been one of the epicenters attacked most severely by the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) since the first patient was diagnosed in one of the city's hospitals. We now report complete genome sequences of the BJ Group, including four isolates (Isolates BJ01, BJ02, BJ03, and BJ04) of the SARS-CoV.It is remarkable that all members of the BJ Group share a common haplotype, consisting of seven loci that differentiate the group from other isolates published to date. Among 42 substitutions uniquely identified from the BJ group, 32 are non-synonymous changes at the amino acid level. Rooted phylogenetic trees, proposed on the basis of haplotypes and other sequence variations of SARS-CoV isolates from Canada, USA, Singapore, and China, gave rise to different paradigms but positioned the BJ Group, together with the newly discovered GD01 (GD-Ins29) in the same clade, followed by the H-U Group (from Hong Kong to USA) and the H-T Group (from Hong Kong to Toronto), leaving the SP Group (Singapore) more distant. This result appears to suggest a possible transmission path from Guangdong to Beijing/Hong Kong, then to other countries and regions.展开更多
Although various genome projects have provided us enormous static sequenceinformation, understanding of the sophisticated biology continues to require integrating thecomputational modeling, system analysis, technology...Although various genome projects have provided us enormous static sequenceinformation, understanding of the sophisticated biology continues to require integrating thecomputational modeling, system analysis, technology development for experiments, and quantitativeexperiments all together to analyze the biology architecture on various levels, which is just theorigin of systems biology subject. This review discusses the object, its characteristics, andresearch attentions in systems biology, and summarizes the analysis methods, experimentaltechnologies, research developments, and so on in the four key fields of systems biology—systemicstructures, dynamics, control methods, and design principles.展开更多
SARS-CoV, as the pathogeny of severe acute respiratory syndrome (SARS), is a mystery that the origin of the virus is still unknown even a few isolates of the virus were completely sequenced. To explore the genesis of ...SARS-CoV, as the pathogeny of severe acute respiratory syndrome (SARS), is a mystery that the origin of the virus is still unknown even a few isolates of the virus were completely sequenced. To explore the genesis of SARS-CoV, the FDOD method previously developed by us was applied to comparing complete genomes from 12 SARS-CoV isolates to those from 12 previously identified coronaviruses and an unrooted phylogenetic tree was constructed. Our results show that all SARS-CoV isolates were clustered into a clique and previously identified coronaviruses formed the other clique. Meanwhile, the three groups of coronaviruses depart from each other clearly in our tree that is consistent with the results of prevenient papers. Differently, from the topology of the phylogenetic tree we found that SARS-CoV is more close to group 1 within genus coronavirus. The topology map also shows that the 12 SARS-CoV isolates may be divided into two groups determined by the association with the SARS-CoV from the Hotel M in Hong Kong that may give some information about the infectious relationship of the SARS.展开更多
基金the Special Funds for Major National Basic Research Projects,国家自然科学基金
文摘A general and flexible multi-motif model is proposed based on dynamic programming. By extending theGibbs sampler to the dynamic programming and introducing temperature, an efficient algorithm is developed. Branchpoint signalsequences and translation initiation sequences extracted from the rice genome are then examined.
文摘Human leukocyte antigen (HLA) system is the most polymorphic region known in the human genome. In the present study, we analyzed for the first time the HLA-A gene polymorphisms defined by the high-resolution typing methods-sequence-based typing (SBT) in 161 Northern Chinese Han people. A total of 74 different HLA-A gene types and 36 alleles were detected. The most frequent alleles were A*110101 (GP=0.2360), A*24020101 (GF=0.1646), and A*020101 (GF=0.1553); followed by A*3303 (GF=0.1180), A*3001 (GF=0.0590), and A*310102 (GF=0.0404). The frequencies of following alleles, A*0203, A*0205, A*0206, A*0207, A*030101, A*2423, A*2601, A*3201, and A*3301, are all higher than 0.0093. The homozygous alleles include A*020101, A*110101, A*24020101 and A*310102. Heterozygosity (H), polymorphism information content (PIC), discrimination power (DP) and probability of paternity exclusion (PPE) of HLA-A in the samples were calculated and their values were 0.8705, 0.8491, 0.6014, and 0.9475, respectively. These results by SBT analysis of HLA-A polymorphism in Northern Chinese Han population, especially the allele subtypes character, will be of great interest for clinical transplantation, disease-associated study and forensic identification. Implementation of high-resolution typing methods allows a significantly wider spectrum of HLA variation including rare alleles. This spectrum will further be extensively utilized in many fields.
文摘We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an extra 29-nucleotide sequence located at the nucleotide positions between 27,863 and 27,864 (referred to the complete sequence of BJ01) within an overlapped region composed of BGI-PUP5 (BGI-postulated uncharacterized protein 5) and BGI-PUP6 upstream of the N (nucleocapsid) protein. The discovery of this minor genotype, GD-Ins29, suggests a significant genetic event and differentiates it from the previously reported genotype, the dominant form among all sequenced SARS-CoV isolates. A 17-nt segment of this extra sequence is identical to a segment of the same size in two human mRNA sequences that may interfere with viral genome replication and transcription in the cytosol of the infected cells. It provides a new avenue for the exploration of the virus-host interaction in viral evolution, host pathogenesis, and vaccine development.
文摘The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino acids, in the viral genome. Its structure can be divided into three regions: a long N-terminal region in the exterior, a characteristic transmembrane (TM) region, and a short C-terminus in the interior of a virion. We detected fifteen substitutions of nucleotides by comparisons with the seventeen published SARS-CoV genome sequences, eight (53.3%) of which are non-synonymous mutations leading to amino acid alternations with predicted physiochemical changes. The possible antigenic determinants of the S protein are predicted, and the result is confirmed by ELISA (enzyme-linked immunosorbent assay) with synthesized peptides. Another profound finding is that three disulfide bonds are defined at the C-terminus with the N-terminus of the E (envelope) protein, based on the typical sequence and positions, thus establishing the structural connection with these two important structural proteins, if confirmed. Phyloge-netic analysis reveals several conserved regions that might be potent drug targets.
文摘We discovered 528 putative cytochrome P450s (P450s) in Oryza sativa L. ssp. indica using Arabidopsis thaliana P450s as database. Those putative rice P450s are thought to belong to 40 families classified in Arabidopsis thaliana. We compared distributions of Arabidopsis thaliana and Oryza sativa P450s and found the two species have similar distribution patterns. However, family distributions of two species also have some differences. For example, in rice, the gene number in families of CYP71, CYP72, CYP76, CYP89, CYP94 and CYP709 is more than twice that in Arabidopsis thaliana; and there are 33 CYP705 members in Arabidopsis thaliana but none in rice. We also found gene members in CYP71 and CYP81 are organized as tandem arrays repeated in the rice genome; maybe they are duplications in the evolutionary event. Furthermore, we accumulated expression sequence tag (EST) evidence for 263 putative rice P450s, which are expressed at transcriptional level and more likely to be true P450s.
文摘The Coronaviridae family is characterized by a nucleocapsid that is composed of the genome RNA molecule in combination with the nucleoprotein (N protein) within a virion. The most striking physiochemical feature of the N protein of SARS-CoV is that it is a typical basic protein with a high predicted pI and high hydrophilicity, which is consistent with its function of binding to the ribophosphate backbone of the RNA molecule. The predicted high extent of phosphorylation of the N protein on multiple candidate phosphorylation sites demonstrates that it would be related to important functions, such as RNA-binding and localization to the nucleolus of host cells. Subsequent study shows that there is an SR-rich region in the N protein and this region might be involved in the protein-protein interaction. The abundant antigenic sites predicted in the N protein, as well as experimental evidence with synthesized polypeptides, indicate that the N protein is one of the major antigens of the SARS-CoV. Compared with other viral structural proteins, the low variation rate of the N protein with regards to its size suggests its importance to the survival of the virus.
文摘In order to develop clinical diagnostic tools for rapid detection of SARS-CoV (severe acute respiratory syndrome-associated coronavirus) and to identify candidate proteins for vaccine development, the C-terminal portion of the nucleocapsid (NC) gene was amplified using RT-PCR from the SARS-CoV genome, cloned into a yeast expression vector (pEGH), and expressed as a glutathione S-transferase (GST) and Hisx6 double-tagged fusion protein under the control of an inducible promoter. Western analysis on the purified protein confirmed the expression and purification of the NC fusion proteins from yeast. To determine its antigenicity, the fusion protein was challenged with serum samples from SARS patients and normal controls. The NC fusion protein demonstrated high antigenicity with high specificity, and therefore, it should have great potential in designing clinical diagnostic tools and provide useful information for vaccine development.
文摘The porcine major histocompatibility complex (MHC, also named swine leukocyte antigen, SLA) is associ- ated not only with immune responsibility and disease suscep- tibility, but also with some reproductive and productive traits such as growth rate and carcass composition. As yet system- atical research on SLA expression profile is not reported. In order to illustrate SLA expression comprehensively and deepen our understanding of its function, we outlined the expression profile of SLA in 51 tissues of Landrace by ana- lyzing a large amount of ESTs produced by “Sino-Danish Porcine Genome Project”. In addition, we also compared the expression profile of SLA in several tissues from different development stages and from another breed (Erhualian). The result shows: (i) classical SLA genes are highly expressed in immune tissues and middle part of intestine; (ii) although SLA-3 is an SLA Ia gene, its expression abundance and pat- tern are quite different from those of the other two SLA Ia genes. The same phenomenon is seen in HLA-C expression, suggesting that the two genes may function similarly and undergo convergent evolution; (iii) except in jejunum, the antigen presenting genes are more highly expressed in breed Erhualian than in Landrace. The difference might associate with the higher resistance to bad conditions (including pathogens) of Erhualian and higher growth rates of Land- race.
文摘For some historic reasons, our new journal is named 'Genomics, Pro teomics & Bioinformatics', or as we have nicknamed it in short the Journal of GPB. A growing number of '-ome' and '-omics' have appeared in many diverse fields of biology, especially in the recent years under profound influences of the Human Genome Project and many other genome projects completed or in progress. We had almost attempted to re-name this journal 'Ever-more-omics' to include all the new comers. However, after a second thought, we have decided to entertain these 'Three Kingdoms' first while we are keeping an eye on others.
文摘We recently reported the use of a gene-trapping approach to isolate cell clones in which a reporter gene had integrated into genes modulated by T-cell activation. We have now tested a panel of clones from that report and identified the one that responds to a variety of G-protein coupled receptors (GPCR). The β-lactamase tagged EGR-3 Jurkat cell was used to dissect specific GPCR signaling in vivo. Three GPCRs were studied, including the chemokine receptor CXCR4 (Gi-coupled) that was endogenously expressed, the platelet activation factor (PAF) receptor (Gq-coupled), and B2 adrenergic receptor (Gs-coupled) that was both stably transfected. Agonists for each receptor activated transcription of the β-lactamase tagged EGR-3 gene. Induction of EGR-3 through CXCR4 was blocked by pertussis toxin and PD58059, a specific inhibitor of MEK (MAPK/ERK kinase). Neither of these inhibitors blocked isoproterenol or PAF-mediated activation of EGR-3. Conversely,β2- and PAF-mediated EGR-3 activation was blocked by the p38, specific inhibitor SB580. In addition, both β2- and PAF-mediated EGR-3 activation could be synergistically activated by CXCR4 activation. This combined result indicates that EGR-3 can be activated through distinct signal transduction pathways by different GPCRs and that signals can be integrated and amplified to efficiently tune the level of activation.
文摘The E (envelope) protein is the smallest structural protein in all coronaviruses and is the only viral structural protein in which no variation has been detected. We conducted genome sequencing and phylogenetic analyses of SARS-CoV. Based on genome sequencing, we predicted the E protein is a transmembrane (TM) protein characterized by a TM region with strong hydrophobicity and α-helix conformation. We identified a segment (NH2-_L-Cys-A-Y-Cys-Cys-N_-COOH) in the carboxyl-terminal region of the E protein that appears to form three disulfide bonds with another segment of corresponding cysteines in the carboxyl-terminus of the S (spike) protein. These bonds point to a possible structural association between the E and S proteins. Our phylogenetic analyses of the E protein sequences in all published coronaviruses place SARS-CoV in an independent group in Coronaviridae and suggest a non-human animal origin.
文摘Expressed Sequence Tag (EST) analysis has pioneered genome-wide gene discovery and expression profiling. In order to establish a gene expression index in the rice cultivar indica, we sequenced and analyzed 86,136 ESTs from nine rice cDNA libraries from the super hybrid cultivar LYP9 and its parental cultivars. We assembled these ESTs into 13,232 contigs and leave 8,976 singletons. Overall, 7,497 sequences were found similar to the existing sequences in GenBank and 14,711 are novel. These sequences are classified by molecular function, biological process and pathways according to the Gene Ontology. We compared our sequenced ESTs with the publicly available 95,000 ESTs from japonica, and found little sequence variation, despite the large difference between genome sequences. We then assembled the combined 173,000 rice ESTs for further analysis. Using the pooled ESTs, we compared gene expression in metabolism pathway between rice and Arabidopsis according to KEGG. We further profiled gene expression patterns in different tissues, developmental stages, and in a conditional sterile mutant, after checking the libraries are comparable by means of sequence coverage. We also identified some possible library specific genes and a number of enzymes and transcription factors that contribute to rice development.
文摘We studied structural and immunological properties of the SARS-CoV M (membrane) protein, based on comparative analyses of sequence features, phylogenetic investigation, and experimental results. The M protein is predicted to contain a triple-spanning transmembrane (TM) region, a single N-glycosylation site near its N-terminus that is in the exterior of the virion, and a long C-terminal region in the interior. The M protein harbors a higher substitution rate (0.6% correlated to its size) among viral open reading frames (ORFs) from published data. The four substitutions detected in the M protein, which cause non-synonymous changes, can be classified into three types. One of them results in changes of pI (isoelectric point) and charge, affecting antigenicity. The second changes hydrophobicity of the TM region, and the third one relates to hydrophilicity of the interior structure. Phylogenetic tree building based on the variations of the M protein appears to support the non-human origin of SARS-CoV. To investigate its immunogenicity, we synthesized eight oligopeptides covering 69.2% of the entire ORF and screened them by using ELISA (enzyme-linked immunosorbent assay) with sera from SARS patients. The results confirmed our predictions on antigenic sites.
文摘The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affects the accuracy of repeat assembly and scaffold construction. We also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.
基金This work was supported by the National High-Tech Research and Development Program of China (No.2002AA229061)the Major Knowledge Innovation Programs of the Chinese Academy of Sciences (No.KSCX1-01).
文摘Since pig is an important livestock species worldwide, its gene expressionhas been investigated intensively, but rarely in brain. In order to study gene expression profilesin the pig central nervous system, we sequenced and analyzed 43,122 high-quality 5'' end expressedsequence tags (ESTs) from porcine cerebellum, cortex cerebrum, and brain stem cDNA libraries,involving several different prenatal and postnatal developmental stages. The initial ESTs wereassembled into 16,101 clusters and compared to protein and nucleic acid databases in GenBank. Ofthese sequences, 30.6% clusters matched protein databases and represented function known sequences;75.1% had significant hits to nucleic acid databases and partial represented known function; 73.3%matched known porcine ESTs; and 21.5% had no matches to any known sequences in GenBank. We used thecategories defined by the Gene Ontology to survey gene expression in the porcine brain.
基金This work was supported by the Sino-Danish Pig Genome Project.
文摘To obtain an initial overview of gene diversity and expression pattern in porcine thymus, 11,712 ESTs (Expressed Sequence Tags) from 100-day-old porcine thymus (FTY) were sequenced and 7,071 cleaned ESTs were used for gene expression analysis. Clustered by the PHRAP program, 959 contigs and 3,074 singlets were obtained. Blast search showed that 806 contigs and 1,669 singlets (totally 5,442 ESTs) had homologues in GenBank and 1,629 ESTs were novel. According to the Gene Ontology classification, 36.99% ESTs were cataloged into the gene expression group, indicating that although the functional gene (18.78% in defense group) of thymus is expressed in a certain degree, the 100-day-old porcine thymus still exists in a developmental stage. Comparative analysis showed that the gene expression pattern of the 100-day-old porcine thymus is similar to that of the human infant thymus.
文摘Beijing has been one of the epicenters attacked most severely by the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) since the first patient was diagnosed in one of the city's hospitals. We now report complete genome sequences of the BJ Group, including four isolates (Isolates BJ01, BJ02, BJ03, and BJ04) of the SARS-CoV.It is remarkable that all members of the BJ Group share a common haplotype, consisting of seven loci that differentiate the group from other isolates published to date. Among 42 substitutions uniquely identified from the BJ group, 32 are non-synonymous changes at the amino acid level. Rooted phylogenetic trees, proposed on the basis of haplotypes and other sequence variations of SARS-CoV isolates from Canada, USA, Singapore, and China, gave rise to different paradigms but positioned the BJ Group, together with the newly discovered GD01 (GD-Ins29) in the same clade, followed by the H-U Group (from Hong Kong to USA) and the H-T Group (from Hong Kong to Toronto), leaving the SP Group (Singapore) more distant. This result appears to suggest a possible transmission path from Guangdong to Beijing/Hong Kong, then to other countries and regions.
文摘Although various genome projects have provided us enormous static sequenceinformation, understanding of the sophisticated biology continues to require integrating thecomputational modeling, system analysis, technology development for experiments, and quantitativeexperiments all together to analyze the biology architecture on various levels, which is just theorigin of systems biology subject. This review discusses the object, its characteristics, andresearch attentions in systems biology, and summarizes the analysis methods, experimentaltechnologies, research developments, and so on in the four key fields of systems biology—systemicstructures, dynamics, control methods, and design principles.
文摘SARS-CoV, as the pathogeny of severe acute respiratory syndrome (SARS), is a mystery that the origin of the virus is still unknown even a few isolates of the virus were completely sequenced. To explore the genesis of SARS-CoV, the FDOD method previously developed by us was applied to comparing complete genomes from 12 SARS-CoV isolates to those from 12 previously identified coronaviruses and an unrooted phylogenetic tree was constructed. Our results show that all SARS-CoV isolates were clustered into a clique and previously identified coronaviruses formed the other clique. Meanwhile, the three groups of coronaviruses depart from each other clearly in our tree that is consistent with the results of prevenient papers. Differently, from the topology of the phylogenetic tree we found that SARS-CoV is more close to group 1 within genus coronavirus. The topology map also shows that the 12 SARS-CoV isolates may be divided into two groups determined by the association with the SARS-CoV from the Hotel M in Hong Kong that may give some information about the infectious relationship of the SARS.