Synthetic biology promises to simplify the construction of metabolic pathways by assembling the de- tached modules of the whole pathway. This gives new approaches for the microbial production of industrial products su...Synthetic biology promises to simplify the construction of metabolic pathways by assembling the de- tached modules of the whole pathway. This gives new approaches for the microbial production of industrial products such as polyhydroxyalkanoates (PHA). In this study, to produce poly(3-hydroxybutyrate-co-3-hydroxyhexanoate) (PHBHHx) by Pseudomonas stutzeri 1317 from unrelated carbon sources such as glucose, the phaCl-phaZ-phaC2 operon of P. stutzeri 1317 was knocked out to generate the PHA deficient mutant P. stutzeri 1317LF. Then three modules containing phaCahAReBRe, phaCahBReGep and phaCAhPah were introduced into P. stutzeri 1317LF separately The shake flask results indicated that the precursor supply and PHA synthase activity were the vital factors for the PHBHHx accumulation of P. stutzeri 1317LF. Furthermore, the PHBHHx accumulation of the recombinants from different carbon resources were performed. The highest PHBHHx content was 23.7% (by mass) with 58.6% (by mole) 3HB fraction. These results provide basis for further improving the PHBHHx accumulation of P. stutzeri from unrelated carbon sources.展开更多
Complicated molecular alterations in tumors generate various mutant peptides.Some of these mutant peptides can be presented to the cell surface and then elicit immune responses,and such mutant peptides are called neoa...Complicated molecular alterations in tumors generate various mutant peptides.Some of these mutant peptides can be presented to the cell surface and then elicit immune responses,and such mutant peptides are called neoantigens.Accurate detection of neoantigens could help to design personalized cancer vaccines.Although some computational frameworks for neoantigen detection have been proposed,most of them can only detect SNV-and indel-derived neoantigens.In addition,current frameworks adopt oversimplified neoantigen prioritization strategies.These factors hinder the comprehensive and effective detection of neoantigens.We developed NeoHunter,flexible software to systematically detect and prioritize neoantigens from sequencing data in different formats.NeoHunter can detect not only SNV-and indel-derived neoantigens but also gene fusion-and aberrant splicing-derived neoantigens.NeoHunter supports both direct and indirect immunogenicity evaluation strategies to prioritize candidate neoantigens.These strategies utilize binding characteristics,existing biological big data,and T-cell receptor specificity to ensure accurate detection and prioritization.We applied NeoHunter to the TESLA dataset,cohorts of melanoma and non-small cell lung cancer patients.NeoHunter achieved high performance across the TESLA cancer patients and detected 79%(27 out of 34)of validated neoantigens in total.SNV-and indel-derived neoantigens accounted for 90%of the top 100 candidate neoantigens while neoantigens from aberrant splicing accounted for 9%.Gene fusion-derived neoantigens were detected in one patient.NeoHunter is a powerful tool to‘catch all’neoantigens and is available for free academic use on Github(XuegongLab/NeoHunter).展开更多
Protein binding is essential to the transport,decay and regulation of almost all RNA molecules.However,the structural preference of protein binding on RNAs and their cellular functions and dynamics upon changing envir...Protein binding is essential to the transport,decay and regulation of almost all RNA molecules.However,the structural preference of protein binding on RNAs and their cellular functions and dynamics upon changing environmental conditions are poorly understood.Here,we integrated various high-throughput data and introduced a computational framework to describe the global interactions between RNA binding proteins(RBPs)and structured RNAs in yeast at single-nucleotide resolution.We found that on average,in terms of percent total lengths,~15%of mRNA untranslated regions(UTRs),~37%of canonical non-coding RNAs(ncRNAs)and^11%of long ncRNAs(lncRNAs)are bound by proteins.The RBP binding sites,in general,tend to occur at single-stranded loops,with evolutionarily conserved signatures,and often facilitate a specific RNA structure conformation in vivo.We found that four nucleotide modifications of tRNA are significantly associated with RBP binding.We also identified various structural motifs bound by RBPs in the UTRs of mRNAs,associated with localization,degradation and stress responses.Moreover,we identified>200 novel lncRNAs bound by RBPs,and about half of them contain conserved secondary structures.We present the first ensemble pattern of RBP binding sites in the structured non-coding regions of a eukaryotic genome,emphasizing their structural context and cellular functions.展开更多
Microbial polyhydroxyalkanoates (PHAs) are a family of biopolyesters produced by many wild type and engineered bacteria. PHAs have diverse structures accompanied by flexible thermal and mechanical properties. Combin...Microbial polyhydroxyalkanoates (PHAs) are a family of biopolyesters produced by many wild type and engineered bacteria. PHAs have diverse structures accompanied by flexible thermal and mechanical properties. Combined with their in vitro biodegradation, cell and tissue compatibility, PHAs have been studied for medical applications, especially medical implants applications, including heart valve tissue engineering, vascular tissue engineering, bone tissue engineering, cartilage tissue engineering, nerve conduit tissue engineering as well as esophagus tissue engineering. Most studies have been conducted in the authors' lab in the past 20+ years. Recently, mechanism on PHA promoted tissue regeneration was revealed to relate to cell responses to PHA biodegradation products and cell-material interactions mediated by microRNA. Very importantly, PHA implants were found not to cause carcinogenesis during long-term implantation. Thus, PHAs should have a bright future in biomedical areas.展开更多
Colorectal cancer(CRC) progression is associated with cancer cell dedifferentiation and stemness acquisition. Several methods have been developed to identify stemness signatures in CRCs. However, studies that directly...Colorectal cancer(CRC) progression is associated with cancer cell dedifferentiation and stemness acquisition. Several methods have been developed to identify stemness signatures in CRCs. However, studies that directly measured the degree of dedifferentiation in CRC tissues are limited. It is unclear how the differentiation states change during CRC progression. To address this, we develop a method to analyze the tissue differentiation spectrum in colorectal cancer using normal gastrointestinal singlecell transcriptome data. Applying this method on 281 tumor samples from The Cancer Genome Atlas Colon Adenocarcinoma dataset, we identified three major CRC subtypes with distinct tissue differentiation pattern. We observed that differentiation states are closely correlated with anti-tumor immune response and patient outcomes in CRC. Highly dedifferentiated CRC samples escaped the immune surveillance and exhibited poor outcomes;mildly dedifferentiated CRC samples showed resistance to anti-tumor immune responses and had a worse survival rate;well-differentiated CRC samples showed sustained anti-tumor immune responses and had a good prognosis. Overall, the spectrum of tissue differentiation observed in CRCs can be used for future clinical risk stratification and subtype-based therapy selection.展开更多
The year 2021 is the 20th anniversary of the publication of the draft human genome[1,2].The sequencing of the human genome has brought life sciences into a new era,the era to understand the information systems of life...The year 2021 is the 20th anniversary of the publication of the draft human genome[1,2].The sequencing of the human genome has brought life sciences into a new era,the era to understand the information systems of life in a quantitative manner.Such understanding lights up the bright future of individualized precision medicine,and enables the rational design of synthetic biological systems that can benefit mankind in broad aspects from industry,agriculture to environment and health.Using the human genome sequence as a basic reference has becoming a routine practice in current biological and medical studies.It is so common that people almost take the existence of the reference genome as granted.But the launching and completion of the Human Genome Project(HGP)was far from a routine practice.It was revolutionary in the history of science in the scientific vision,technological advancement,as well as in the joint efforts of multiple disciplines to accomplish the big scientific goal,in starting the culture and building the infrastructure of data sharing,and in the effective international multi-center collaboration.展开更多
As a new inter- and multi-disciplinary forum for modeling, engineering and understanding life, our journal Quantitative Biology (QB) is celebrating its 6th armiversary and the start of a new Editorial Board in this is...As a new inter- and multi-disciplinary forum for modeling, engineering and understanding life, our journal Quantitative Biology (QB) is celebrating its 6th armiversary and the start of a new Editorial Board in this issue. The past 6 years have evideneed tremendous progress in the research on computational biology, systems biology and synthetic biology. Omics studies have expanded from genomes and transcriptomes to multiple aspects of epigenomes and their interactions. People's understanding and modeling on genomes is becoming 3D or 4D. The developme nt of high throughput sin gle-cell geno mics tech no logies has aroused huge enthusiasm for building the Human Cell Atlas as an ultimate referenee for all future studies on human.展开更多
Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics softwa...Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors. Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings. Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system. Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios.展开更多
The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose t...The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi-scale and multi-granule measure and representation of life in the digital space, the mathematical and/or computational modeling of the biology behind physiological and pathological processes, and ultimately cyber twins of healthy or diseased human body in the virtual space that can be used to simulate complex biological processes and deduce effects of medical treatments. We advocate that dLife is the route toward future AI precision medicine and should be the new paradigm for future biological and medical research.展开更多
Microbial synthesis of functional polymers has become increasingly important for industrial biotechnology. For the first time, it became possible to synthesize controllable composition of poly(3-hydroxyalkanoate) (...Microbial synthesis of functional polymers has become increasingly important for industrial biotechnology. For the first time, it became possible to synthesize controllable composition of poly(3-hydroxyalkanoate) (P3HA) consisting of 3-hydroxydodec- anoate (3HDD) and phenyl group on the side-chain when chromosome of Pseudomonas entomophila was edited to weaken its t-oxidation. Cultured in the presence of 5-phenylvaleric acid (PVA), the edited P. entomophila produced only homopolymer poly(3-hydroxy-5-phenylvalerate) or P(3HPhV). While copolyesters P(3HPhV-co-3HDD) of 3-hydroxy-5-phenylvalerate (3HPhV) and 3-hydroxydodecanoate (3HDD) were synthesized when the strain was grown on mixtures of PVA and dodecanoic acid (DDA). Compositions of 3HPhV in P(3HPhV-co-3HDD) were controllable ranging from 3% to 32% depending on DDDA/PVA ratios. Nuclear magnetic resonance (NMR) spectra clearly indicated that the polymers were homopolymer of P(3HPhV) and random copolymers of 3HPhV and 3HDD. Their mechanical and thermal properties varied dramatically de- pending on the monomer ratios. Our results demonstrated the possibility to produce tailor-made, novel functional PHA using the chromosome edited P. entomophila.展开更多
DNA methylation is a chemical modification of the bases in genomes. This modification, most frequently found at CpG dinucleotides in eukaryotes, has been identified as having multiple critical functions in broad and d...DNA methylation is a chemical modification of the bases in genomes. This modification, most frequently found at CpG dinucleotides in eukaryotes, has been identified as having multiple critical functions in broad and diverse species of animals and plants, while mysteriously appears to be lacking from several other well-studied species. DNA methylation has well known and important roles in genome stability and defense, its pattern change highly correlates with gene regulation. Much evidence has linked abnormal DNA methylation to human diseases. Most prominently, aberrant DNA methylation is a common feature of cancer genomes. Elucidating the precise functions of DNA methylation therefore has great biomedical significance. Here we provide an update on large-scale experimental technologies for detecting DNA methylation on a genomic scale. We also discuss new prospect and challenges that computational biologist will face when analyzing DNA methylation data.展开更多
Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little...Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re- weighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we con- sider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected hinge loss and the nonlinearity of latent representa- tions. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.展开更多
Pseudouridine(Ψ)is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and m RNAs.However,the functions,mechanisms,and precise distribution ofΨs(especially in m RNAs)sti...Pseudouridine(Ψ)is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and m RNAs.However,the functions,mechanisms,and precise distribution ofΨs(especially in m RNAs)still remain largely unclear.The landscape ofΨs across the transcriptome has not yet been fully delineated.Here,we present a highly effective model based on a convolutional neural network(CNN),called Pseudo Uridy Lation Site Estimator(PULSE),to analyze large-scale profiling data ofΨsites and characterize the contextual sequence features of pseudouridylation.PULSE,consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network,can automatically learn the hidden patterns of pseudouridylation from the local sequence information.Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy,thus enabling us to further characterize the transcriptome-wide landscape ofΨsites.We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation,such as the regulations of RNA secondary structure,codon usage,translation,and RNA stability,and the connection to single nucleotide variants.The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.展开更多
基金Supported by the National lqatural Science Foundation of China (31260015), Natural Science Foundation of Qinghai Province (2012-Z-919Q), the Extramural Project from State Key Laboratory for Agrobiotechnology (2012SKLAB06-5) and the Research Funds for Young Project of Qinghal University (2011-QYY-1).
文摘Synthetic biology promises to simplify the construction of metabolic pathways by assembling the de- tached modules of the whole pathway. This gives new approaches for the microbial production of industrial products such as polyhydroxyalkanoates (PHA). In this study, to produce poly(3-hydroxybutyrate-co-3-hydroxyhexanoate) (PHBHHx) by Pseudomonas stutzeri 1317 from unrelated carbon sources such as glucose, the phaCl-phaZ-phaC2 operon of P. stutzeri 1317 was knocked out to generate the PHA deficient mutant P. stutzeri 1317LF. Then three modules containing phaCahAReBRe, phaCahBReGep and phaCAhPah were introduced into P. stutzeri 1317LF separately The shake flask results indicated that the precursor supply and PHA synthase activity were the vital factors for the PHBHHx accumulation of P. stutzeri 1317LF. Furthermore, the PHBHHx accumulation of the recombinants from different carbon resources were performed. The highest PHBHHx content was 23.7% (by mass) with 58.6% (by mole) 3HB fraction. These results provide basis for further improving the PHBHHx accumulation of P. stutzeri from unrelated carbon sources.
基金National Key R&D Program of China,Grant/Award Number:2021YFF1200900National Natural Science Foundation of China,Grant/Award Numbers:61721003,62250005,62103227。
文摘Complicated molecular alterations in tumors generate various mutant peptides.Some of these mutant peptides can be presented to the cell surface and then elicit immune responses,and such mutant peptides are called neoantigens.Accurate detection of neoantigens could help to design personalized cancer vaccines.Although some computational frameworks for neoantigen detection have been proposed,most of them can only detect SNV-and indel-derived neoantigens.In addition,current frameworks adopt oversimplified neoantigen prioritization strategies.These factors hinder the comprehensive and effective detection of neoantigens.We developed NeoHunter,flexible software to systematically detect and prioritize neoantigens from sequencing data in different formats.NeoHunter can detect not only SNV-and indel-derived neoantigens but also gene fusion-and aberrant splicing-derived neoantigens.NeoHunter supports both direct and indirect immunogenicity evaluation strategies to prioritize candidate neoantigens.These strategies utilize binding characteristics,existing biological big data,and T-cell receptor specificity to ensure accurate detection and prioritization.We applied NeoHunter to the TESLA dataset,cohorts of melanoma and non-small cell lung cancer patients.NeoHunter achieved high performance across the TESLA cancer patients and detected 79%(27 out of 34)of validated neoantigens in total.SNV-and indel-derived neoantigens accounted for 90%of the top 100 candidate neoantigens while neoantigens from aberrant splicing accounted for 9%.Gene fusion-derived neoantigens were detected in one patient.NeoHunter is a powerful tool to‘catch all’neoantigens and is available for free academic use on Github(XuegongLab/NeoHunter).
基金supported by the National Natural Science Foundation of China(31271402 and 31100601)the National Key Basic Research Program(2012CB316503)
文摘Protein binding is essential to the transport,decay and regulation of almost all RNA molecules.However,the structural preference of protein binding on RNAs and their cellular functions and dynamics upon changing environmental conditions are poorly understood.Here,we integrated various high-throughput data and introduced a computational framework to describe the global interactions between RNA binding proteins(RBPs)and structured RNAs in yeast at single-nucleotide resolution.We found that on average,in terms of percent total lengths,~15%of mRNA untranslated regions(UTRs),~37%of canonical non-coding RNAs(ncRNAs)and^11%of long ncRNAs(lncRNAs)are bound by proteins.The RBP binding sites,in general,tend to occur at single-stranded loops,with evolutionarily conserved signatures,and often facilitate a specific RNA structure conformation in vivo.We found that four nucleotide modifications of tRNA are significantly associated with RBP binding.We also identified various structural motifs bound by RBPs in the UTRs of mRNAs,associated with localization,degradation and stress responses.Moreover,we identified>200 novel lncRNAs bound by RBPs,and about half of them contain conserved secondary structures.We present the first ensemble pattern of RBP binding sites in the structured non-coding regions of a eukaryotic genome,emphasizing their structural context and cellular functions.
基金financially supported by the State Basic Science Foundation 973 project(Nos.2012CB725201 and 2012CB725200)
文摘Microbial polyhydroxyalkanoates (PHAs) are a family of biopolyesters produced by many wild type and engineered bacteria. PHAs have diverse structures accompanied by flexible thermal and mechanical properties. Combined with their in vitro biodegradation, cell and tissue compatibility, PHAs have been studied for medical applications, especially medical implants applications, including heart valve tissue engineering, vascular tissue engineering, bone tissue engineering, cartilage tissue engineering, nerve conduit tissue engineering as well as esophagus tissue engineering. Most studies have been conducted in the authors' lab in the past 20+ years. Recently, mechanism on PHA promoted tissue regeneration was revealed to relate to cell responses to PHA biodegradation products and cell-material interactions mediated by microRNA. Very importantly, PHA implants were found not to cause carcinogenesis during long-term implantation. Thus, PHAs should have a bright future in biomedical areas.
基金supported in part by the National Key R&D Program of China(2017YFC0910400)the National Natural Science Foundation of China(61721003)+1 种基金Tsinghua-Fuzhou Institute for Data Technology(TFIDT2018006)China Postdoctoral Science Foundation(2020M670297).
文摘Colorectal cancer(CRC) progression is associated with cancer cell dedifferentiation and stemness acquisition. Several methods have been developed to identify stemness signatures in CRCs. However, studies that directly measured the degree of dedifferentiation in CRC tissues are limited. It is unclear how the differentiation states change during CRC progression. To address this, we develop a method to analyze the tissue differentiation spectrum in colorectal cancer using normal gastrointestinal singlecell transcriptome data. Applying this method on 281 tumor samples from The Cancer Genome Atlas Colon Adenocarcinoma dataset, we identified three major CRC subtypes with distinct tissue differentiation pattern. We observed that differentiation states are closely correlated with anti-tumor immune response and patient outcomes in CRC. Highly dedifferentiated CRC samples escaped the immune surveillance and exhibited poor outcomes;mildly dedifferentiated CRC samples showed resistance to anti-tumor immune responses and had a worse survival rate;well-differentiated CRC samples showed sustained anti-tumor immune responses and had a good prognosis. Overall, the spectrum of tissue differentiation observed in CRCs can be used for future clinical risk stratification and subtype-based therapy selection.
文摘The year 2021 is the 20th anniversary of the publication of the draft human genome[1,2].The sequencing of the human genome has brought life sciences into a new era,the era to understand the information systems of life in a quantitative manner.Such understanding lights up the bright future of individualized precision medicine,and enables the rational design of synthetic biological systems that can benefit mankind in broad aspects from industry,agriculture to environment and health.Using the human genome sequence as a basic reference has becoming a routine practice in current biological and medical studies.It is so common that people almost take the existence of the reference genome as granted.But the launching and completion of the Human Genome Project(HGP)was far from a routine practice.It was revolutionary in the history of science in the scientific vision,technological advancement,as well as in the joint efforts of multiple disciplines to accomplish the big scientific goal,in starting the culture and building the infrastructure of data sharing,and in the effective international multi-center collaboration.
文摘As a new inter- and multi-disciplinary forum for modeling, engineering and understanding life, our journal Quantitative Biology (QB) is celebrating its 6th armiversary and the start of a new Editorial Board in this issue. The past 6 years have evideneed tremendous progress in the research on computational biology, systems biology and synthetic biology. Omics studies have expanded from genomes and transcriptomes to multiple aspects of epigenomes and their interactions. People's understanding and modeling on genomes is becoming 3D or 4D. The developme nt of high throughput sin gle-cell geno mics tech no logies has aroused huge enthusiasm for building the Human Cell Atlas as an ultimate referenee for all future studies on human.
基金We thank Dr. Hongfei Cui for her comments on the simulation design. This work is partially supported by the National Natural Science Foundation of China (Nos. 61673231 and 61721003).
文摘Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors. Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings. Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system. Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios.
基金partially supported by the National Natural Science Foundation of China(NSFC)(Nos.61721003 and 62250005)the National Key R&D Program of China(No.2021YFF1200900)Tsinghua-Fuzhou Institute for Data Technology(No.TFIDT2021005).
文摘The rapid development of biological technology (BT) and information technology (IT) especially of genomics and artificial intelligence (AI) is bringing great potential for revolutionizing future medicine. We propose the concept and framework of Digital Life Systems or dLife as a new paradigm to unleash this potential. It includes the multi-scale and multi-granule measure and representation of life in the digital space, the mathematical and/or computational modeling of the biology behind physiological and pathological processes, and ultimately cyber twins of healthy or diseased human body in the virtual space that can be used to simulate complex biological processes and deduce effects of medical treatments. We advocate that dLife is the route toward future AI precision medicine and should be the new paradigm for future biological and medical research.
基金supported by the National High Technology Research and Development Program of China(2012AA023102 to Liu Lei,Guo Kai and Wu Qiong)the National Basic Research Program of China(2012CB725201 to Chen GuoQiang and Chen JinChun,2012CB725204 to Guo Kai and Wu Qiong)National Natural Science Foundation of China(31270146 to Chen GuoQiang)
文摘Microbial synthesis of functional polymers has become increasingly important for industrial biotechnology. For the first time, it became possible to synthesize controllable composition of poly(3-hydroxyalkanoate) (P3HA) consisting of 3-hydroxydodec- anoate (3HDD) and phenyl group on the side-chain when chromosome of Pseudomonas entomophila was edited to weaken its t-oxidation. Cultured in the presence of 5-phenylvaleric acid (PVA), the edited P. entomophila produced only homopolymer poly(3-hydroxy-5-phenylvalerate) or P(3HPhV). While copolyesters P(3HPhV-co-3HDD) of 3-hydroxy-5-phenylvalerate (3HPhV) and 3-hydroxydodecanoate (3HDD) were synthesized when the strain was grown on mixtures of PVA and dodecanoic acid (DDA). Compositions of 3HPhV in P(3HPhV-co-3HDD) were controllable ranging from 3% to 32% depending on DDDA/PVA ratios. Nuclear magnetic resonance (NMR) spectra clearly indicated that the polymers were homopolymer of P(3HPhV) and random copolymers of 3HPhV and 3HDD. Their mechanical and thermal properties varied dramatically de- pending on the monomer ratios. Our results demonstrated the possibility to produce tailor-made, novel functional PHA using the chromosome edited P. entomophila.
基金supported by NIH under Grant Nos. ES017166 and HG001696
文摘DNA methylation is a chemical modification of the bases in genomes. This modification, most frequently found at CpG dinucleotides in eukaryotes, has been identified as having multiple critical functions in broad and diverse species of animals and plants, while mysteriously appears to be lacking from several other well-studied species. DNA methylation has well known and important roles in genome stability and defense, its pattern change highly correlates with gene regulation. Much evidence has linked abnormal DNA methylation to human diseases. Most prominently, aberrant DNA methylation is a common feature of cancer genomes. Elucidating the precise functions of DNA methylation therefore has great biomedical significance. Here we provide an update on large-scale experimental technologies for detecting DNA methylation on a genomic scale. We also discuss new prospect and challenges that computational biologist will face when analyzing DNA methylation data.
文摘Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re- weighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we con- sider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected hinge loss and the nonlinearity of latent representa- tions. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.61472205 and 81630103)the US National Science Foundation(Grant Nos.DBI-1262107 and IIS-1646333)+1 种基金the China’s Youth 1000Talent Programthe Beijing Advanced Innovation Center for Structural Biology。
文摘Pseudouridine(Ψ)is the most prevalent post-transcriptional RNA modification and is widespread in small cellular RNAs and m RNAs.However,the functions,mechanisms,and precise distribution ofΨs(especially in m RNAs)still remain largely unclear.The landscape ofΨs across the transcriptome has not yet been fully delineated.Here,we present a highly effective model based on a convolutional neural network(CNN),called Pseudo Uridy Lation Site Estimator(PULSE),to analyze large-scale profiling data ofΨsites and characterize the contextual sequence features of pseudouridylation.PULSE,consisting of two alternatively-stacked convolution and pooling layers followed by a fully-connected neural network,can automatically learn the hidden patterns of pseudouridylation from the local sequence information.Extensive validation tests demonstrated that PULSE can outperform other state-of-the-art prediction methods and achieve high prediction accuracy,thus enabling us to further characterize the transcriptome-wide landscape ofΨsites.We further showed that the prediction results derived from PULSE can provide novel insights into understanding the functional roles of pseudouridylation,such as the regulations of RNA secondary structure,codon usage,translation,and RNA stability,and the connection to single nucleotide variants.The source code and final model for PULSE are available at https://github.com/mlcb-thu/PULSE.