Proteins play a pivotal role in coordinating the functions of organisms,essentially governing their traits,as the dynamic arrangement of diverse amino acids leads to a multitude of folded configurations within peptide...Proteins play a pivotal role in coordinating the functions of organisms,essentially governing their traits,as the dynamic arrangement of diverse amino acids leads to a multitude of folded configurations within peptide chains.Despite dynamic changes in amino acid composition of an individual protein(referred to as AAP)and great variance in protein expression levels under different conditions,our study,utilizing transcriptomics data from four model organisms uncovers surprising stability in the overall amino acid composition of the total cellular proteins(referred to as AACell).Although this value may vary between different species,we observed no significant differences among distinct strains of the same species.This indicates that organisms enforce system-level constraints to maintain a consistent AACell,even amid fluctuations in AAP and protein expression.Further exploration of this phenomenon promises insights into the intricate mechanisms orchestrating cellular protein expression and adaptation to varying environmental challenges.展开更多
Objective:To use the gene chip of pseudomonas aeruginosa as a research sample and to explore it at an omics level,aiming at elucidating the co-expression network characteristics of the virulence genes exoS and exoU of...Objective:To use the gene chip of pseudomonas aeruginosa as a research sample and to explore it at an omics level,aiming at elucidating the co-expression network characteristics of the virulence genes exoS and exoU of pseudomonas aeruginosa in the lower respiratory tract from the perspective of molecular biology and identifying its key regulatory genes.Methods:From March 2016 to May 2018,312 patients infected with pseudomonas aeruginosa in the lower respiratory tract who were admitted to Department of Respiratory Medicine of Baogang Hospital and given follow-up treatments in the hospital were selected as subjects by use of cluster sampling.Alveolar lavage fluid and sputum collected from those patients were used as biological specimens.The genes of pseudomonas aeruginosa were detected with the help of oligonucleotide probes to make a pre-processing of chip data.A total of 8 common antibiotics(ceftazidime,gentamicin,piperacillin,amikacin,ciprofloxacin,levofloxacin,doripenem and ticarcillin)against Gram-negative bacteria were selected to determine the drug resistance of biological specimens.MCODE algorithm was used to construct a co-expression network model of the drug-resistance genes focused on exoS/exoU.Results:The expression level of exoS/exoU in the drug-resistance group was significantly higher than that in the non-resistance group(p<0.05).The top 5 differentially expressed genes in the alveolar lavage fluid specimens from the drug-resistance group were RAC1,ITGB1,ITGB5,CRK and IGF1R in the order from high to low.In the sputum specimens,the top 5 differentially expressed genes were RAC1,CRK,IGF1R,ITGB1 and ITGB5.In the alveolar lavage fluid specimens,only RAC1 had a positive correlation with the expression of exoS and exoU(p<0.05).In the sputum specimens,RAC1,ITGB1,ITGB5,CRK and IGF1R were positively correlated with the expression of exoS and exoU(p<0.05).The genes included in the co-expression network contained exoS,exoU,RAC1,ITGB1,ITGB5,CRK,CAMK2D,RHOA,FLNA,IGF1R,TGFBR2 and FOS.Among them,RAC1 had a highest score in the aspect of regulatory ability(72.00)and the largest number of regulatory genes(6);followed by ITGB1,ITGB5 and CRK genes.Conclusions:The high expression of exoS and exoU in the sputum specimens suggests that pseudomonas aeruginosa has a higher probability to get resistant to antibiotics;RAC1,ITGB1,ITGB5 and CRK genes may be the key genes that can regulate the expression of exoS and exoU.展开更多
Gastrointestinal(GI)cancers are a set of diverse diseases affecting many parts/organs.The five most frequent GI cancer types are esophageal,gastric cancer(GC),liver cancer,pancreatic cancer,and colorectal cancer(CRC);...Gastrointestinal(GI)cancers are a set of diverse diseases affecting many parts/organs.The five most frequent GI cancer types are esophageal,gastric cancer(GC),liver cancer,pancreatic cancer,and colorectal cancer(CRC);together,they give rise to 5 million new cases and cause the death of 3.5 million people annually.We provide information about molecular changes crucial to tumorigenesis and the behavior and prognosis.During the formation of cancer cells,the genomic changes are microsatellite instability with multiple chromosomal arrangements in GC and CRC.The genomically stable subtype is observed in GC and pancreatic cancer.Besides these genomic subtypes,CRC has epigenetic modification(hypermethylation)associated with a poor prognosis.The pathway information highlights the functions shared by GI cancers such as apoptosis;focal adhesion;and the p21-activated kinase,phosphoinositide 3-kinase/Akt,transforming growth factor beta,and Toll-like receptor signaling pathways.These pathways show survival,cell proliferation,and cell motility.In addition,the immune response and inflammation are also essential elements in the shared functions.We also retrieved information on protein-protein interaction from the STRING database,and found that proteins Akt1,catenin beta 1(CTNNB1),E1A binding protein P300,tumor protein p53(TP53),and TP53 binding protein 1(TP53BP1)are central nodes in the network.The protein expression of these genes is associated with overall survival in some GI cancers.The low TP53BP1 expression in CRC,high EP300 expression in esophageal cancer,and increased expression of Akt1/TP53 or low CTNNB1 expression in GC are associated with a poor prognosis.The Kaplan Meier plotter database also confirmed the association between expression of the five central genes and GC survival rates.In conclusion,GI cancers are very diverse at the molecular level.However,the shared mutations and protein pathways might be used to understand better and reveal diagnostic/prognostic or drug targets.展开更多
The convergence of artificial intelligence(AI)and microbial therapeutics offers promising avenues for novel discoveries and therapeutic interventions.With the exponential growth of omics datasets and rapid advancement...The convergence of artificial intelligence(AI)and microbial therapeutics offers promising avenues for novel discoveries and therapeutic interventions.With the exponential growth of omics datasets and rapid advancements in AI technology,the next generation of AI is increasingly prevalent in microbiology research.In microbial research,AI is instrumental in the classification and functional annotation of microorganisms.Machine learning algorithms facilitate efficient and accurate categorization of microbial taxa,enabling the identification of functional traits and metabolic pathways within microbial communities.Additionally,AI-driven protein design strategies hold promise for engineering enzymes with enhanced catalytic activities and stabilities.By predicting protein structures,functions,and interactions,AI algorithms enable the rational design of proteins and enzymes tailored for specific applications.AI systems are already present in clinical microbiology laboratories in the form of expert rules used by some automated susceptibility testing and identification systems.In the future,microbiology technologists will rely more heavily on AI for initial screening,allowing them to focus on diagnostic challenges and complex technical interpretations.AI-driven approaches hold immense promise in advancing our understanding of microbial ecosystems,accelerating drug discovery processes,and fostering the development of groundbreaking therapeutic interventions.This review aims to summarize common algorithms in AI and their applications within microbiology and synthetic biology.We provide a comprehensive evaluation of AI’s utility in microbial research,discussing both its advantages and challenges.Finally,we explore future research directions and the bottlenecks faced by AI in the microbial field.展开更多
Explainable artificial intelligence aims to interpret how machine learning models make decisions,and many model explainers have been developed in the computer vision field.However,understanding of the applicability of...Explainable artificial intelligence aims to interpret how machine learning models make decisions,and many model explainers have been developed in the computer vision field.However,understanding of the applicability of these model explainers to biological data is still lacking.In this study,we comprehensively evaluated multiple explainers by interpreting pre-trained models for predicting tissue types from transcriptomic data and by identifying the top contributing genes from each sample with the greatest impacts on model prediction.To improve the reproducibility and interpretability of results generated by model explainers,we proposed a series of optimization strategies for each explainer on two different model architectures of multilayer perceptron(MLP)and convolutional neural network(CNN).We observed three groups of explainer and model architecture combinations with high reproducibility.Group II,which contains three model explainers on aggregated MLP models,identified top contributing genes in different tissues that exhibited tissue-specific manifestation and were potential cancer biomarkers.In summary,our work provides novel insights and guidance for exploring biological mechanisms using explainable machine learning models.展开更多
Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretabilit...Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretability,Least Absolute Shrinkage and Selection Operator(LASSO)algorithm is one of the most popular methods for the scenarios of clinical biomarker development.However,in practice,applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables,leading to the overfitting of the model.Here,we present VSOLassoBag,a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data.Using a bagging strategy in combination with a parametric method or inflection point search method,VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates.The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction.In addition,by comparing with multiple existing algorithms,VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others.In summary,VSOLassoBag,which is available at https://seqworld.com/VSOLassoBag/under the GPL v3 license,provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data.For user’s convenience,we implement VSOLassoBag as an R package that provides multithreading computing configurations.展开更多
基金This research was funded by the National Key R&D Program of China(2022YFC2106000)National Natural Science Foundation of China(32300529,32201242,12326611)+2 种基金Tianjin Synthetic Biotechnology Innovation Capacity Improvement Projects(TSBICIP-PTJS-001,TSBICIP-PTJJ-007)Major Program of Haihe Laboratory of Synthetic Biology(22HHSWSS00021)Strategic Priority Research Program of the Chinese Academy of Sciences(XDC0120201)。
文摘Proteins play a pivotal role in coordinating the functions of organisms,essentially governing their traits,as the dynamic arrangement of diverse amino acids leads to a multitude of folded configurations within peptide chains.Despite dynamic changes in amino acid composition of an individual protein(referred to as AAP)and great variance in protein expression levels under different conditions,our study,utilizing transcriptomics data from four model organisms uncovers surprising stability in the overall amino acid composition of the total cellular proteins(referred to as AACell).Although this value may vary between different species,we observed no significant differences among distinct strains of the same species.This indicates that organisms enforce system-level constraints to maintain a consistent AACell,even amid fluctuations in AAP and protein expression.Further exploration of this phenomenon promises insights into the intricate mechanisms orchestrating cellular protein expression and adaptation to varying environmental challenges.
文摘Objective:To use the gene chip of pseudomonas aeruginosa as a research sample and to explore it at an omics level,aiming at elucidating the co-expression network characteristics of the virulence genes exoS and exoU of pseudomonas aeruginosa in the lower respiratory tract from the perspective of molecular biology and identifying its key regulatory genes.Methods:From March 2016 to May 2018,312 patients infected with pseudomonas aeruginosa in the lower respiratory tract who were admitted to Department of Respiratory Medicine of Baogang Hospital and given follow-up treatments in the hospital were selected as subjects by use of cluster sampling.Alveolar lavage fluid and sputum collected from those patients were used as biological specimens.The genes of pseudomonas aeruginosa were detected with the help of oligonucleotide probes to make a pre-processing of chip data.A total of 8 common antibiotics(ceftazidime,gentamicin,piperacillin,amikacin,ciprofloxacin,levofloxacin,doripenem and ticarcillin)against Gram-negative bacteria were selected to determine the drug resistance of biological specimens.MCODE algorithm was used to construct a co-expression network model of the drug-resistance genes focused on exoS/exoU.Results:The expression level of exoS/exoU in the drug-resistance group was significantly higher than that in the non-resistance group(p<0.05).The top 5 differentially expressed genes in the alveolar lavage fluid specimens from the drug-resistance group were RAC1,ITGB1,ITGB5,CRK and IGF1R in the order from high to low.In the sputum specimens,the top 5 differentially expressed genes were RAC1,CRK,IGF1R,ITGB1 and ITGB5.In the alveolar lavage fluid specimens,only RAC1 had a positive correlation with the expression of exoS and exoU(p<0.05).In the sputum specimens,RAC1,ITGB1,ITGB5,CRK and IGF1R were positively correlated with the expression of exoS and exoU(p<0.05).The genes included in the co-expression network contained exoS,exoU,RAC1,ITGB1,ITGB5,CRK,CAMK2D,RHOA,FLNA,IGF1R,TGFBR2 and FOS.Among them,RAC1 had a highest score in the aspect of regulatory ability(72.00)and the largest number of regulatory genes(6);followed by ITGB1,ITGB5 and CRK genes.Conclusions:The high expression of exoS and exoU in the sputum specimens suggests that pseudomonas aeruginosa has a higher probability to get resistant to antibiotics;RAC1,ITGB1,ITGB5 and CRK genes may be the key genes that can regulate the expression of exoS and exoU.
文摘Gastrointestinal(GI)cancers are a set of diverse diseases affecting many parts/organs.The five most frequent GI cancer types are esophageal,gastric cancer(GC),liver cancer,pancreatic cancer,and colorectal cancer(CRC);together,they give rise to 5 million new cases and cause the death of 3.5 million people annually.We provide information about molecular changes crucial to tumorigenesis and the behavior and prognosis.During the formation of cancer cells,the genomic changes are microsatellite instability with multiple chromosomal arrangements in GC and CRC.The genomically stable subtype is observed in GC and pancreatic cancer.Besides these genomic subtypes,CRC has epigenetic modification(hypermethylation)associated with a poor prognosis.The pathway information highlights the functions shared by GI cancers such as apoptosis;focal adhesion;and the p21-activated kinase,phosphoinositide 3-kinase/Akt,transforming growth factor beta,and Toll-like receptor signaling pathways.These pathways show survival,cell proliferation,and cell motility.In addition,the immune response and inflammation are also essential elements in the shared functions.We also retrieved information on protein-protein interaction from the STRING database,and found that proteins Akt1,catenin beta 1(CTNNB1),E1A binding protein P300,tumor protein p53(TP53),and TP53 binding protein 1(TP53BP1)are central nodes in the network.The protein expression of these genes is associated with overall survival in some GI cancers.The low TP53BP1 expression in CRC,high EP300 expression in esophageal cancer,and increased expression of Akt1/TP53 or low CTNNB1 expression in GC are associated with a poor prognosis.The Kaplan Meier plotter database also confirmed the association between expression of the five central genes and GC survival rates.In conclusion,GI cancers are very diverse at the molecular level.However,the shared mutations and protein pathways might be used to understand better and reveal diagnostic/prognostic or drug targets.
基金supported by the National Natural Science Foundation Projects of China(No.82350003,No.92049201).
文摘The convergence of artificial intelligence(AI)and microbial therapeutics offers promising avenues for novel discoveries and therapeutic interventions.With the exponential growth of omics datasets and rapid advancements in AI technology,the next generation of AI is increasingly prevalent in microbiology research.In microbial research,AI is instrumental in the classification and functional annotation of microorganisms.Machine learning algorithms facilitate efficient and accurate categorization of microbial taxa,enabling the identification of functional traits and metabolic pathways within microbial communities.Additionally,AI-driven protein design strategies hold promise for engineering enzymes with enhanced catalytic activities and stabilities.By predicting protein structures,functions,and interactions,AI algorithms enable the rational design of proteins and enzymes tailored for specific applications.AI systems are already present in clinical microbiology laboratories in the form of expert rules used by some automated susceptibility testing and identification systems.In the future,microbiology technologists will rely more heavily on AI for initial screening,allowing them to focus on diagnostic challenges and complex technical interpretations.AI-driven approaches hold immense promise in advancing our understanding of microbial ecosystems,accelerating drug discovery processes,and fostering the development of groundbreaking therapeutic interventions.This review aims to summarize common algorithms in AI and their applications within microbiology and synthetic biology.We provide a comprehensive evaluation of AI’s utility in microbial research,discussing both its advantages and challenges.Finally,we explore future research directions and the bottlenecks faced by AI in the microbial field.
文摘Explainable artificial intelligence aims to interpret how machine learning models make decisions,and many model explainers have been developed in the computer vision field.However,understanding of the applicability of these model explainers to biological data is still lacking.In this study,we comprehensively evaluated multiple explainers by interpreting pre-trained models for predicting tissue types from transcriptomic data and by identifying the top contributing genes from each sample with the greatest impacts on model prediction.To improve the reproducibility and interpretability of results generated by model explainers,we proposed a series of optimization strategies for each explainer on two different model architectures of multilayer perceptron(MLP)and convolutional neural network(CNN).We observed three groups of explainer and model architecture combinations with high reproducibility.Group II,which contains three model explainers on aggregated MLP models,identified top contributing genes in different tissues that exhibited tissue-specific manifestation and were potential cancer biomarkers.In summary,our work provides novel insights and guidance for exploring biological mechanisms using explainable machine learning models.
基金supported by National Key R&D Program of China(2021YFA1302100 to Q.Z)the National Natural Science Foundation of China(82172861 to Q.Z)+1 种基金Guangdong Basic and Applied Basic Research Foundation(2021A1515011743 to Q.Z)National Key Clinical Discipline(to D.Z)。
文摘Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretability,Least Absolute Shrinkage and Selection Operator(LASSO)algorithm is one of the most popular methods for the scenarios of clinical biomarker development.However,in practice,applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables,leading to the overfitting of the model.Here,we present VSOLassoBag,a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data.Using a bagging strategy in combination with a parametric method or inflection point search method,VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates.The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction.In addition,by comparing with multiple existing algorithms,VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others.In summary,VSOLassoBag,which is available at https://seqworld.com/VSOLassoBag/under the GPL v3 license,provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data.For user’s convenience,we implement VSOLassoBag as an R package that provides multithreading computing configurations.