Gastric cancer(GC), the fifth most common cancer globally, remains the leading cause of cancer deaths worldwide. Inflammation-induced tumorigenesis is the predominant process in GC development;therefore, systematic re...Gastric cancer(GC), the fifth most common cancer globally, remains the leading cause of cancer deaths worldwide. Inflammation-induced tumorigenesis is the predominant process in GC development;therefore, systematic research in this area should improve understanding of the biological mechanisms that initiate GC development and promote cancer hallmarks. Here, we summarize biological knowledge regarding gastric inflammation-induced tumorigenesis, and characterize the multi-omics data and systems biology methods for investigating GC development. Of note, we highlight pioneering studies in multi-omics data and state-of-the-art network-based algorithms used for dissecting the features of gastric inflammation-induced tumorigenesis, and we propose translational applications in early GC warning biomarkers and precise treatment strategies. This review offers integrative insights for GC research, with the goal of paving the way to novel paradigms for GC precision oncology and prevention.展开更多
The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment,playing a key role in precision oncology,along with gene signaling,regulation,and their interaction with protein complexes.T...The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment,playing a key role in precision oncology,along with gene signaling,regulation,and their interaction with protein complexes.To tackle the challenge of distinguishing driver genes from a large number of genomic data,we construct a feature extraction framework for discovering pan-cancer driver genes based on multi-omics data(mutations,gene expression,copy number variants,and DNA methylation)combined with protein–protein interaction(PPI)networks.Using a network propagation algorithm,we mine functional information among nodes in the PPI network,focusing on genes with weak node information to represent specific cancer information.From these functional features,we extract distribution features of pan-cancer data,pan-cancer TOPSIS features of functional features using the ideal solution method,and SetExpan features of pan-cancer data from the gene functional features,a method to rank pan-cancer data based on the average inverse rank.These features represent the common message of pan-cancer.Finally,we use the lightGBM classification algorithm for gene prediction.Experimental results show that our method outperforms existing methods in terms of the area under the check precision-recall curve(AUPRC)and demonstrates better performance across different PPI networks.This indicates our framework’s effectiveness in predicting potential cancer genes,offering valuable insights for the diagnosis and treatment of tumors.展开更多
Genome-wide association studies(GWASs)have identified over 140 colorectal cancer(CRC)-associated loci;however,target genes at the majority of loci and underlying molecular mechanisms are poorly understood.Here,we util...Genome-wide association studies(GWASs)have identified over 140 colorectal cancer(CRC)-associated loci;however,target genes at the majority of loci and underlying molecular mechanisms are poorly understood.Here,we utilized a Bayesian approach,integrative risk gene selector(iRIGS),to prioritize risk genes at CRC GWAS loci by integrating multi-omics data.As a result,a total of 105 high-confidence risk genes(HRGs)were identified,which exhibited strong gene dependencies for CRC and enrichment in the biological processes implicated in CRC.Among the 105 HRGs,CEBPB,located at the 20q13.13 locus,acted as a transcription factor playing critical roles in cancer.Our subsequent assays indicated the tumor promoter function of CEBPB that facilitated CRC cell proliferation by regulating multiple oncogenic pathways such as MAPK,PI3K-Akt,and Ras signaling.Next,by integrating a fine-mapping analysis and three independent case-control studies in Chinese populations consisting of 8,039 cases and 12,775 controls,we elucidated that rs1810503,a putative functional variant regulating CEBPB,was associated with CRC risk(OR=0.90,95%CI=0.86–0.93,P=1.07×10^(−7)).The association between rs1810503 and CRC risk was further validated in three additional multi-ancestry populations consisting of 24,254 cases and 58,741 controls.Mechanistically,the rs1810503 A to T allele change weakened the enhancer activity in an allele-specific manner to decrease CEBPB expression via longrange promoter-enhancer interactions,mediated by the transcription factor,REST,and thus decreased CRC risk.In summary,our study provides a genetic resource and a generalizable strategy for CRC etiology investigation,and highlights the biological implications of CEBPB in CRC tumorigenesis,shedding new light on the etiology of CRC.展开更多
Metabolic network construction plays a pivotal role in unraveling the regulatory mechanism of biological activities,although it often proves to be challenging and labor-intensive,particularly with non-model organisms....Metabolic network construction plays a pivotal role in unraveling the regulatory mechanism of biological activities,although it often proves to be challenging and labor-intensive,particularly with non-model organisms.In this study,we develop a computational approach that employs reaction models based on the structure-guided chemical modification and related compounds to construct a metabolic network in wheat.This construction results in a comprehensive structure-guided network,including 625 identified metabolites and additional 333 putative reactions compared with the Kyoto Encyclopedia of Genes and Genomes database.Using a combination of gene annotation,reaction classification,structure similarity,and correlations from transcriptome and metabolome analysis,a total of 229 potential genes related to these reactions are identified within this network.To validate the network,the functionality of a hydroxycinnamoyltransferase(TraesCS3D01G314900)for the synthesis of polyphenols and a rhamnosyltransferase(TraesCS2D01G078700)for the modification of flavonoids are verified through in vitro enzymatic studies and wheat mutant tests,respectively.Our research thus supports the utility of structure-guided chemical modification as an effective tool in identifying causal candidate genes for constructing metabolic networks and further in metabolomic genetic studies.展开更多
Childhood asthma is one of the most common respiratory diseases with rising mortality and morbidity.The multi-omics data is providing a new chance to explore collaborative biomarkers and corresponding diagnostic model...Childhood asthma is one of the most common respiratory diseases with rising mortality and morbidity.The multi-omics data is providing a new chance to explore collaborative biomarkers and corresponding diagnostic models of childhood asthma.To capture the nonlinear association of multi-omics data and improve interpretability of diagnostic model,we proposed a novel deep association model(DAM)and corresponding efficient analysis framework.First,the Deep Subspace Reconstruction was used to fuse the omics data and diagnostic information,thereby correcting the distribution of the original omics data and reducing the influence of unnecessary data noises.Second,the Joint Deep Semi-Negative Matrix Factorization was applied to identify different latent sample patterns and extract biomarkers from different omics data levels.Third,our newly proposed Deep Orthogonal Canonical Correlation Analysis can rank features in the collaborative module,which are able to construct the diagnostic model considering nonlinear correlation between different omics data levels.Using DAM,we deeply analyzed the transcriptome and methylation data of childhood asthma.The effectiveness of DAM is verified from the perspectives of algorithm performance and biological significance on the independent test dataset,by ablation experiment and comparison with many baseline methods from clinical and biological studies.The DAM-induced diagnostic model can achieve a prediction AUC of o.912,which is higher than that of many other alternative methods.Meanwhile,relevant pathways and biomarkers of childhood asthma are also recognized to be collectively altered on the gene expression and methylation levels.As an interpretable machine learning approach,DAM simultaneously considers the non-linear associations among samples and those among biological features,which should help explore interpretative biomarker candidates and efficient diagnostic models from multi-omics data analysis for human complexdiseases.展开更多
Single-cell multi-omics sequencing has greatly accelerated reproductive research in recent years,and the data are continually growing.However,utilizing these data resources is challenging for wet-lab researchers.A com...Single-cell multi-omics sequencing has greatly accelerated reproductive research in recent years,and the data are continually growing.However,utilizing these data resources is challenging for wet-lab researchers.A comprehensive platform for exploring single-cell multi-omics data related to reproduction is urgently needed.Here,we introduce the single-cell multi-omics atlas of reproduction(SMARTdb),an integrative and user-friendly platform for exploring molecular dynamics of reproductive development,aging,and disease,which covers multi-omics,multi-species,and multi-stage data.We curated and analyzed single-cell transcriptomic and epigenomic data of over 2.0 million cells from 6 species across the entire lifespan.A series of powerful functionalities are provided,such as“Query gene expression”,“DIY expression plot”,“DNA methylation plot”,and“Epigenome browser”.With SMARTdb,we found that the male germ cell-specific expression pattern of RPL39L and RPL10L is conserved between human and other model animals.Moreover,DNA hypomethylation and open chromatin may collectively regulate the specific expression pattern of RPL39L in both male and female germ cells.In summary,SMARTdb is a powerful platform for convenient data mining and gaining novel insights into reproductive development,aging,and disease.SMARTdb is publicly available at https://smart-db.cn.展开更多
Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants.Traditional methods typically use linear regression models with clear assumptions;such methods are unable to captu...Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants.Traditional methods typically use linear regression models with clear assumptions;such methods are unable to capture the complex relationships between genotypes and phenotypes.Non-linear models(e.g.,deep neural networks)have been proposed as a superior alternative to linear models because they can capture complex non-additive effects.Here we introduce a deep learning(DL)method,deep neural network genomic prediction(DNNGP),for integration of multi-omics data in plants.We trained DNNGP on four datasets and compared its performance with methods built with five classic models:genomic best linear unbiased prediction(GBLUP);two methods based on a machine learning(ML)framework,light gradient boosting machine(LightGBM)and support vector regression(SVR);and two methods based on a DL framework,deep learning genomic selection(DeepGS)and deep learning genome-wide association study(DLGWAS).DNNGP is novel in five ways.First,it can be applied to a variety of omics data to predict phenotypes.Second,the multilayered hierarchical structure of DNNGP dynamically learns features from raw data,avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation(rectified linear unit)functions.Third,when small datasets were used,DNNGP produced results that are competitive with results from the other five methods,showing greater prediction accuracy than the other methods when large-scale breeding data were used.Fourth,the computation time required by DNNGP was comparable with that of commonly used methods,up to 10 times faster than DeepGS.Fifth,hyperparameters can easily be batch tuned on a local machine.Compared with GBLUP,LightGBM,SVR,DeepGS and DLGWAS,DNNGP is superior to these existing widely used genomic selection(GS)methods.Moreover,DNNGP can generate robust assessments from diverse datasets,including omics data,and quickly incorporate complex and large datasets into usable models,making it a promising and practical approach for straightforward integration into existing GS platforms.展开更多
Background:Chromatin-associated RNA(caRNA)acts as a ubiquitous epigenetic layer in eukaryotes,and has been reported to be essential in various biological processes,including gene transcription,chromatin remodeling and...Background:Chromatin-associated RNA(caRNA)acts as a ubiquitous epigenetic layer in eukaryotes,and has been reported to be essential in various biological processes,including gene transcription,chromatin remodeling and cellular differentiation.Recently,numerous experimental techniques have been developed to characterize genome-wide RNA-chromatin interactions to understand their underlying biological functions.However,these experimental methods are generally expensive,time-consuming,and limited in identifying all potential sites,while most of the existing computational methods are restricted to detecting only specific types of RNAs interacting with chromatin.Methods:Here,we propose a highly interpretable computational framework,named DeepRCI,to identify the interactions between various types of RNAs and chromatin.In this framework,we introduce a novel deep learning component called variformer and integrate multi-omics data to capture intrinsic genomic features at both RNA and DNA levels.Results:Extensive experiments demonstrate that DeepRCI can detect RNA-chromatin interactions more accurately when compared to the state-of-the-art baseline prediction methods.Furthermore,the sequence features extracted by DeepRCI can be well matched to known critical gene regulatory components,indicating that our model can provide useful biological insights into understanding the underlying mechanisms of RNA-chromatin interactions.In addition,based on the prediction results,we further delineate the relationships between RNA-chromatin interactions and cellular functions,including gene expression and the modulation of cell states.Conclusions:In summary,DeepRCI can serve as a useful tool for characterizing RNA-chromatin interactions and studying the underlying gene regulatory code.展开更多
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information,with its application to neuroscience termed neuroinformatics.Da...Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information,with its application to neuroscience termed neuroinformatics.Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms,which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases.Importantly,integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile.In this review,we first summarize data mining studies utilizing datasets from the individual type of omics analysis,including epigenetics/epigenomics,transcriptomics,proteomics,metabolomics,lipidomics,and spatial omics,pertaining to Alzheimer's disease,Parkinson's disease,and multiple sclerosis.We then discuss multi-omics integration approaches,including independent biological integration and unsupervised integration methods,for more intuitive and informative interpretation of the biological data obtained across different omics layers.We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks.Finally,we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery,therapeutic development,and elucidation of disease mechanisms.We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.展开更多
In the post-genome-wide association study era,multi-omics techniques have shown great power and poten-tial for candidate gene mining and functional genomics research.However,due to the lack of effective data integrati...In the post-genome-wide association study era,multi-omics techniques have shown great power and poten-tial for candidate gene mining and functional genomics research.However,due to the lack of effective data integration and multi-omics analysis platforms,such techniques have not still been applied widely in rape-seed,an important oil crop worldwide.Here,we report a rapeseed multi-omics database(BnlR;http:/l yanglab.hzau.edu.cn/BnlR),which provides datasets of six omics including genomics,transcriptomics,variomics,epigenetics,phenomics,and metabolomics,as well as numerous"variation-gene expression-phenotype"associations by using multiple statistical methods.In addition,a series of multi-omics search and analysis tools are integrated to facilitate the browsing and application of these datasets.BnlR is the most comprehensive multi-omics database for rapeseed so far,and two case studies demonstrated its power to mine candidate genes associated with specific traits and analyze their potential regulatory mechanisms.展开更多
Background:Physiological and biochemical processes across tissues of the body are regulated in response to the high demands of intense physical activity in several occupations,such as firefighting,law enforcement,mili...Background:Physiological and biochemical processes across tissues of the body are regulated in response to the high demands of intense physical activity in several occupations,such as firefighting,law enforcement,military,and sports.A better understanding of such processes can ultimately help improve human performance and prevent illnesses in the work environment.Methods:To study regulatory processes in intense physical activity simulating real-life conditions,we performed a multi-omics analysis of 3 biofluids(blood plasma,urine,and saliva)collected from 11 wildland firefighters before and after a 45 min,intense exercise regimen.Omics profiles post-vs.pre-exercise were compared by Student’s t-test followed by pathway analysis and comparison between the different omics modalities.Results:Our multi-omics analysis identified and quantified 3835 proteins,730 lipids and 182 metabolites combining the 3 different types of samples.The blood plasma analysis revealed signatures of tissue damage and acute repair response accompanied by enhanced carbon metabolism to meet energy demands.The urine analysis showed a strong,concomitant regulation of 6 out of 8 identified proteins from the renin-angiotensin system supporting increased excretion of catabolites,reabsorption of nutrients and maintenance of fluid balance.In saliva,we observed a decrease in 3 pro-inflammatory cytokines and an increase in 8 antimicrobial peptides.A systematic literature review identified 6 papers that support an altered susceptibility to respiratory infection.Conclusions:This study shows simultaneous regulatory signatures in biofluids indicative of homeostatic maintenance during intense physical activity with possible effects on increased infection susceptibility,suggesting that caution against respiratory diseases could benefit workers on highly physical demanding jobs.展开更多
Accurate genomic information is essential for advancing genetic breeding research in specific rice varieties.This study presented a gapless genome assembly of the indica rice cultivar Zhonghui 8015(ZH8015)using Pac Bi...Accurate genomic information is essential for advancing genetic breeding research in specific rice varieties.This study presented a gapless genome assembly of the indica rice cultivar Zhonghui 8015(ZH8015)using Pac Bio HiFi,Hi-C,and ONT(Oxford Nanopore Technologies)ultra-long sequencing technologies,annotating 43037 gene structures.Subsequently,utilizing this genome along with transcriptomic and metabolomic techniques,we explored ZH8015's response to brown planthopper(BPH)infestation.Continuous transcriptomic sampling indicated significant changes in gene expression levels around 48 h after BPH feeding.Enrichment analysis revealed particularly significant alterations in genes related to reactive oxygen species scavenging and cell wall formation.Metabolomic results demonstrated marked increases in levels of several monosaccharides,which are components of the cell wall and dramatic changes in flavonoid contents.Omics association analysis identified differentially expressed genes associated with key metabolites,shedding light on ZH8015's response to BPH infestation.In summary,this study constructed a reliable genome sequence resource for ZH8015,and the preliminary multi-omics results will guide future insect-resistant breeding research.展开更多
Background:Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues.However,an integrative and possibly systems-based analysis capturing the different modalities is...Background:Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues.However,an integrative and possibly systems-based analysis capturing the different modalities is challenging.In response,bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis.It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.Methods:We designed LIBRA,a neural network based framework,to learn translation between paired multi-omics profiles so that a shared latent space is constructed.Additionally,we implemented a variation,aLIBRA,that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks.All model parameters and evaluation metrics are made available to users with minimal user iteration.Furthermore,aLIBRA allows experienced users to implement custom configurations.The LIBRA toolbox is freely available as R and Python libraries at GitHub(TranslationalBioinformaticsUnit/LIBRA).Results:LIBRA was evaluated in eight multi-omic single-cell data-sets,including three combinations of omics.We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type(clustering)resolution in the integrated latent space.Furthermore,when assessing the predictive power across data modalities,such as predictive chromatin accessibility from gene expression,LIBRA outperforms existing tools.As expected,adaptive parameter optimization(aLIBRA)significantly boosted the performance of learning predictive models from paired data-sets.Conclusion:LIBRA is a versatile tool that performs competitively in both“integration”and“prediction”tasks based on single-cell multi-omics data.LIBRA is a data-driven robust platform that includes an adaptive learning scheme.展开更多
Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also pr...Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level.Therefore,using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction.In this study,simultaneously using whole-genome sequencing(WGS)and gene expression level data,four strategies for single-nucleotide polymorphism(SNP)preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.Results:Using genomic best linear unbiased prediction(GBLUP)with complete WGS data,the prediction accuracies were 0.208±0.020(0.181±0.022)for the startle response and 0.272±0.017(0.307±0.015)for starvation resistance in the female(male)lines.Compared with GBLUP using complete WGS data,both GBLUP and the genomic feature BLUP(GFBLUP)did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies(GWASs)or transcriptome-wide association studies(TWASs).Furthermore,by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus(eQTL)mapping of all genes,only the startle response had greater accuracy than GBLUP with the complete WGS data.The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022,respectively.Importantly,by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS,both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction.Compared with the GBLUP using complete WGS data,the best accuracy values represented increases of 60.66%and 39.09%for the starvation resistance and 27.40%and 35.36%for startle response in the female and male lines,respectively.Conclusions:Overall,multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction.The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.展开更多
Natural rubber(NR)is an irreplaceable biopolymer of economic and strategic importance owing to its unique physical and chemical properties.The Parárubber tree(Hevea brasiliensis(Willd.ex A.Juss.)Müll.Arg.)is...Natural rubber(NR)is an irreplaceable biopolymer of economic and strategic importance owing to its unique physical and chemical properties.The Parárubber tree(Hevea brasiliensis(Willd.ex A.Juss.)Müll.Arg.)is currently the exclusive commercial source of NR,and it is primarily grown in plantations restricted to the tropical and subtropical areas of Southeast Asia.However,current Parárubber production barely meets the sharply increasing global industrial demand for rubber.Petroleum-based synthetic rubber(SR)has been used to supplement the shortage of NR but its industrial performance is not comparable to that of NR.Thus,there is an urgent need to develop new productive rubber crops with broader environmental adaptability.This review summarizes the current research progress on alternative rubberproducing plants,including horticultural plants(Taraxacum kok-saghyz Rodin and Lactuca L.species),woody plants(Parthenium argentatum A.Gray and Eucommia ulmoides Oliv.),and other plant species with potential for NR production.With an emphasis on the molecular basis of NR biosynthesis revealed by a multi-omics approach,we highlight new integrative strategies and biotechnologies for exploring the mechanism of NR biosynthesis with a broader scope,which may accelerate the breeding and improvement of new rubber crops.展开更多
Recent studies have highlighted spatially resolved multi-omics technologies,including spatial genomics,transcriptomics,proteomics,and metabolomics,as powerful tools to decipher the spatial heterogeneity of the brain.H...Recent studies have highlighted spatially resolved multi-omics technologies,including spatial genomics,transcriptomics,proteomics,and metabolomics,as powerful tools to decipher the spatial heterogeneity of the brain.Here,we focus on two major approaches in spatial transcriptomics(next-generation sequencing-based technologies and image-based technologies),and mass spectrometry imaging technologies used in spatial proteomics and spatial metabolomics.Furthermore,we discuss their applications in neuroscience,including building the brain atlas,uncovering gene expression patterns of neurons for special behaviors,deciphering the molecular basis of neuronal communication,and providing a more comprehensive explanation of the molecular mechanisms underlying central nervous system disorders.However,further efforts are still needed toward the integrative application of multi-omics technologies,including the real-time spatial multi-omics analysis in living cells,the detailed gene profile in a whole-brain view,and the combination of functional verification.展开更多
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi...With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
基金supported by funds from the National Natural Science Foundation of China (Grant No. T2341008)。
文摘Gastric cancer(GC), the fifth most common cancer globally, remains the leading cause of cancer deaths worldwide. Inflammation-induced tumorigenesis is the predominant process in GC development;therefore, systematic research in this area should improve understanding of the biological mechanisms that initiate GC development and promote cancer hallmarks. Here, we summarize biological knowledge regarding gastric inflammation-induced tumorigenesis, and characterize the multi-omics data and systems biology methods for investigating GC development. Of note, we highlight pioneering studies in multi-omics data and state-of-the-art network-based algorithms used for dissecting the features of gastric inflammation-induced tumorigenesis, and we propose translational applications in early GC warning biomarkers and precise treatment strategies. This review offers integrative insights for GC research, with the goal of paving the way to novel paradigms for GC precision oncology and prevention.
基金National Natural Science Foundation of China,Grant/Award Numbers:61902215,61902216,61972226。
文摘The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment,playing a key role in precision oncology,along with gene signaling,regulation,and their interaction with protein complexes.To tackle the challenge of distinguishing driver genes from a large number of genomic data,we construct a feature extraction framework for discovering pan-cancer driver genes based on multi-omics data(mutations,gene expression,copy number variants,and DNA methylation)combined with protein–protein interaction(PPI)networks.Using a network propagation algorithm,we mine functional information among nodes in the PPI network,focusing on genes with weak node information to represent specific cancer information.From these functional features,we extract distribution features of pan-cancer data,pan-cancer TOPSIS features of functional features using the ideal solution method,and SetExpan features of pan-cancer data from the gene functional features,a method to rank pan-cancer data based on the average inverse rank.These features represent the common message of pan-cancer.Finally,we use the lightGBM classification algorithm for gene prediction.Experimental results show that our method outperforms existing methods in terms of the area under the check precision-recall curve(AUPRC)and demonstrates better performance across different PPI networks.This indicates our framework’s effectiveness in predicting potential cancer genes,offering valuable insights for the diagnosis and treatment of tumors.
基金This work was supported by the National Natural Science Foundation of China(82103929,82273713)Young Elite Scientists Sponsorship Program by CAST(2022QNRC001)+7 种基金Fundamental Research Funds for the Central Universities(WHU:2042022kf1205)Knowledge Innovation Program of Wuhan(whkxjsj011)Translational Medicine and Interdisciplinary Research Joint Fund of Zhongnan Hospital of Wuhan University(ZNJC202207)for Jianbo TianDistinguished Young Scholars of China(81925032)Key Program of National Natural Science Foundation of China(82130098)the Leading Talent Program of the Health Commission of Hubei Province,Natural Science Foundation of Hubei Province(2019CFA009)the Fundamental Research Funds for the Central Universities(2042022rc0026,2042023kf1005)for Xiaoping Miaothe National Natural Science Foundation of China(82204128)for Xiaoyang Wang.
文摘Genome-wide association studies(GWASs)have identified over 140 colorectal cancer(CRC)-associated loci;however,target genes at the majority of loci and underlying molecular mechanisms are poorly understood.Here,we utilized a Bayesian approach,integrative risk gene selector(iRIGS),to prioritize risk genes at CRC GWAS loci by integrating multi-omics data.As a result,a total of 105 high-confidence risk genes(HRGs)were identified,which exhibited strong gene dependencies for CRC and enrichment in the biological processes implicated in CRC.Among the 105 HRGs,CEBPB,located at the 20q13.13 locus,acted as a transcription factor playing critical roles in cancer.Our subsequent assays indicated the tumor promoter function of CEBPB that facilitated CRC cell proliferation by regulating multiple oncogenic pathways such as MAPK,PI3K-Akt,and Ras signaling.Next,by integrating a fine-mapping analysis and three independent case-control studies in Chinese populations consisting of 8,039 cases and 12,775 controls,we elucidated that rs1810503,a putative functional variant regulating CEBPB,was associated with CRC risk(OR=0.90,95%CI=0.86–0.93,P=1.07×10^(−7)).The association between rs1810503 and CRC risk was further validated in three additional multi-ancestry populations consisting of 24,254 cases and 58,741 controls.Mechanistically,the rs1810503 A to T allele change weakened the enhancer activity in an allele-specific manner to decrease CEBPB expression via longrange promoter-enhancer interactions,mediated by the transcription factor,REST,and thus decreased CRC risk.In summary,our study provides a genetic resource and a generalizable strategy for CRC etiology investigation,and highlights the biological implications of CEBPB in CRC tumorigenesis,shedding new light on the etiology of CRC.
基金supported by the Young Top-notch Talent Cultivation Program of Hubei Province,the Natural Science Foundation for Distinguished Young Scientists of Hubei Province(2021CFA058)the First-Class Discipline Construction Funds of College of Plant Science and Technology,Huazhong Agricultural University(2023ZKPY005).
文摘Metabolic network construction plays a pivotal role in unraveling the regulatory mechanism of biological activities,although it often proves to be challenging and labor-intensive,particularly with non-model organisms.In this study,we develop a computational approach that employs reaction models based on the structure-guided chemical modification and related compounds to construct a metabolic network in wheat.This construction results in a comprehensive structure-guided network,including 625 identified metabolites and additional 333 putative reactions compared with the Kyoto Encyclopedia of Genes and Genomes database.Using a combination of gene annotation,reaction classification,structure similarity,and correlations from transcriptome and metabolome analysis,a total of 229 potential genes related to these reactions are identified within this network.To validate the network,the functionality of a hydroxycinnamoyltransferase(TraesCS3D01G314900)for the synthesis of polyphenols and a rhamnosyltransferase(TraesCS2D01G078700)for the modification of flavonoids are verified through in vitro enzymatic studies and wheat mutant tests,respectively.Our research thus supports the utility of structure-guided chemical modification as an effective tool in identifying causal candidate genes for constructing metabolic networks and further in metabolomic genetic studies.
基金the Self-supporting Program of Guangzhou Laboratory(SRPG22-007)R&D Program of Guangzhou National Laboratory(GZNL2024A01002)+4 种基金National Natural Science Foundation of China(12371485,11871456)II Phase External Project of Guoke Ningbo Life Science and Health Industry Research Institute(2020YJY0217)Science and Technology Project of Yunnan Province(202103AQ100002)National Key R&D Program of China(2022YFF1202100)The Strategic Priority Research Program of the Chinese Academy of Sciences(XDB38050200,XDB38040202,XDA26040304).
文摘Childhood asthma is one of the most common respiratory diseases with rising mortality and morbidity.The multi-omics data is providing a new chance to explore collaborative biomarkers and corresponding diagnostic models of childhood asthma.To capture the nonlinear association of multi-omics data and improve interpretability of diagnostic model,we proposed a novel deep association model(DAM)and corresponding efficient analysis framework.First,the Deep Subspace Reconstruction was used to fuse the omics data and diagnostic information,thereby correcting the distribution of the original omics data and reducing the influence of unnecessary data noises.Second,the Joint Deep Semi-Negative Matrix Factorization was applied to identify different latent sample patterns and extract biomarkers from different omics data levels.Third,our newly proposed Deep Orthogonal Canonical Correlation Analysis can rank features in the collaborative module,which are able to construct the diagnostic model considering nonlinear correlation between different omics data levels.Using DAM,we deeply analyzed the transcriptome and methylation data of childhood asthma.The effectiveness of DAM is verified from the perspectives of algorithm performance and biological significance on the independent test dataset,by ablation experiment and comparison with many baseline methods from clinical and biological studies.The DAM-induced diagnostic model can achieve a prediction AUC of o.912,which is higher than that of many other alternative methods.Meanwhile,relevant pathways and biomarkers of childhood asthma are also recognized to be collectively altered on the gene expression and methylation levels.As an interpretable machine learning approach,DAM simultaneously considers the non-linear associations among samples and those among biological features,which should help explore interpretative biomarker candidates and efficient diagnostic models from multi-omics data analysis for human complexdiseases.
基金supported by the Young Elite Scientists Sponsorship Program by China Association for Science and Technology(Grant No.2020QNRC001)the research start-up funding from Nanjing Medical University,China(Grant No.KY116RC20200007).
文摘Single-cell multi-omics sequencing has greatly accelerated reproductive research in recent years,and the data are continually growing.However,utilizing these data resources is challenging for wet-lab researchers.A comprehensive platform for exploring single-cell multi-omics data related to reproduction is urgently needed.Here,we introduce the single-cell multi-omics atlas of reproduction(SMARTdb),an integrative and user-friendly platform for exploring molecular dynamics of reproductive development,aging,and disease,which covers multi-omics,multi-species,and multi-stage data.We curated and analyzed single-cell transcriptomic and epigenomic data of over 2.0 million cells from 6 species across the entire lifespan.A series of powerful functionalities are provided,such as“Query gene expression”,“DIY expression plot”,“DNA methylation plot”,and“Epigenome browser”.With SMARTdb,we found that the male germ cell-specific expression pattern of RPL39L and RPL10L is conserved between human and other model animals.Moreover,DNA hypomethylation and open chromatin may collectively regulate the specific expression pattern of RPL39L in both male and female germ cells.In summary,SMARTdb is a powerful platform for convenient data mining and gaining novel insights into reproductive development,aging,and disease.SMARTdb is publicly available at https://smart-db.cn.
基金National Key R&D Program of China(2021YFD1201200)National Science Foundation of China(32022064)+1 种基金Project of Hainan Yazhou Bay Seed Lab(B21HJ0223)Innovation Program of the Chinese Academy of Agricultural Sciences.
文摘Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants.Traditional methods typically use linear regression models with clear assumptions;such methods are unable to capture the complex relationships between genotypes and phenotypes.Non-linear models(e.g.,deep neural networks)have been proposed as a superior alternative to linear models because they can capture complex non-additive effects.Here we introduce a deep learning(DL)method,deep neural network genomic prediction(DNNGP),for integration of multi-omics data in plants.We trained DNNGP on four datasets and compared its performance with methods built with five classic models:genomic best linear unbiased prediction(GBLUP);two methods based on a machine learning(ML)framework,light gradient boosting machine(LightGBM)and support vector regression(SVR);and two methods based on a DL framework,deep learning genomic selection(DeepGS)and deep learning genome-wide association study(DLGWAS).DNNGP is novel in five ways.First,it can be applied to a variety of omics data to predict phenotypes.Second,the multilayered hierarchical structure of DNNGP dynamically learns features from raw data,avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation(rectified linear unit)functions.Third,when small datasets were used,DNNGP produced results that are competitive with results from the other five methods,showing greater prediction accuracy than the other methods when large-scale breeding data were used.Fourth,the computation time required by DNNGP was comparable with that of commonly used methods,up to 10 times faster than DeepGS.Fifth,hyperparameters can easily be batch tuned on a local machine.Compared with GBLUP,LightGBM,SVR,DeepGS and DLGWAS,DNNGP is superior to these existing widely used genomic selection(GS)methods.Moreover,DNNGP can generate robust assessments from diverse datasets,including omics data,and quickly incorporate complex and large datasets into usable models,making it a promising and practical approach for straightforward integration into existing GS platforms.
基金supported in part by the National Natural Science Foundation of China(61872216,T2125007 to JZ,31900862 to DZ)the National Key Research and Development Program of China(2018YFC0910404,2021YFF1201300)the Turing AI Institute of Nanjing,the Tsinghua-Toyota Joint Research Fund and the US National Institute of Health grant(1R01NS125018).
文摘Background:Chromatin-associated RNA(caRNA)acts as a ubiquitous epigenetic layer in eukaryotes,and has been reported to be essential in various biological processes,including gene transcription,chromatin remodeling and cellular differentiation.Recently,numerous experimental techniques have been developed to characterize genome-wide RNA-chromatin interactions to understand their underlying biological functions.However,these experimental methods are generally expensive,time-consuming,and limited in identifying all potential sites,while most of the existing computational methods are restricted to detecting only specific types of RNAs interacting with chromatin.Methods:Here,we propose a highly interpretable computational framework,named DeepRCI,to identify the interactions between various types of RNAs and chromatin.In this framework,we introduce a novel deep learning component called variformer and integrate multi-omics data to capture intrinsic genomic features at both RNA and DNA levels.Results:Extensive experiments demonstrate that DeepRCI can detect RNA-chromatin interactions more accurately when compared to the state-of-the-art baseline prediction methods.Furthermore,the sequence features extracted by DeepRCI can be well matched to known critical gene regulatory components,indicating that our model can provide useful biological insights into understanding the underlying mechanisms of RNA-chromatin interactions.In addition,based on the prediction results,we further delineate the relationships between RNA-chromatin interactions and cellular functions,including gene expression and the modulation of cell states.Conclusions:In summary,DeepRCI can serve as a useful tool for characterizing RNA-chromatin interactions and studying the underlying gene regulatory code.
基金supported by a Lee Kong Chian School of Medicine Dean’s Postdoctoral Fellowship(021207-00001)from Nanyang Technological University(NTU)Singapore and a Mistletoe Research Fellowship(022522-00001)from the Momental Foundation USA.Jialiu Zeng is supported by a Presidential Postdoctoral Fellowship(021229-00001)from NTU Singapore and an Open Fund Young Investigator Research Grant(OF-YIRG)(MOH-001147)from the National Medical Research Council(NMRC)SingaporeSu Bin Lim is supported by the National Research Foundation(NRF)of Korea(Grant Nos.:2020R1A6A1A03043539,2020M3A9D8037604,2022R1C1C1004756)a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),funded by the Ministry of Health&Welfare,Republic of Korea(Grant No.:HR22C1734).
文摘Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information,with its application to neuroscience termed neuroinformatics.Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms,which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases.Importantly,integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile.In this review,we first summarize data mining studies utilizing datasets from the individual type of omics analysis,including epigenetics/epigenomics,transcriptomics,proteomics,metabolomics,lipidomics,and spatial omics,pertaining to Alzheimer's disease,Parkinson's disease,and multiple sclerosis.We then discuss multi-omics integration approaches,including independent biological integration and unsupervised integration methods,for more intuitive and informative interpretation of the biological data obtained across different omics layers.We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks.Finally,we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery,therapeutic development,and elucidation of disease mechanisms.We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
基金supported by the National Natural Science Foundation of China(32070559)the National Key Research and Development Plan of China(2021YFF1000100)+2 种基金the China Postdoctoral Science Foundation(2022M710875)the Hubei Hongshan Laboratory(2021HSZD004)and the Developing Bioinformatics Platform in Hainan Yazhou Bay Seed Lab(no.JBGS-B21HJ0001).
文摘In the post-genome-wide association study era,multi-omics techniques have shown great power and poten-tial for candidate gene mining and functional genomics research.However,due to the lack of effective data integration and multi-omics analysis platforms,such techniques have not still been applied widely in rape-seed,an important oil crop worldwide.Here,we report a rapeseed multi-omics database(BnlR;http:/l yanglab.hzau.edu.cn/BnlR),which provides datasets of six omics including genomics,transcriptomics,variomics,epigenetics,phenomics,and metabolomics,as well as numerous"variation-gene expression-phenotype"associations by using multiple statistical methods.In addition,a series of multi-omics search and analysis tools are integrated to facilitate the browsing and application of these datasets.BnlR is the most comprehensive multi-omics database for rapeseed so far,and two case studies demonstrated its power to mine candidate genes associated with specific traits and analyze their potential regulatory mechanisms.
基金supported by the BRAVE Agile Investment from the PNNL
文摘Background:Physiological and biochemical processes across tissues of the body are regulated in response to the high demands of intense physical activity in several occupations,such as firefighting,law enforcement,military,and sports.A better understanding of such processes can ultimately help improve human performance and prevent illnesses in the work environment.Methods:To study regulatory processes in intense physical activity simulating real-life conditions,we performed a multi-omics analysis of 3 biofluids(blood plasma,urine,and saliva)collected from 11 wildland firefighters before and after a 45 min,intense exercise regimen.Omics profiles post-vs.pre-exercise were compared by Student’s t-test followed by pathway analysis and comparison between the different omics modalities.Results:Our multi-omics analysis identified and quantified 3835 proteins,730 lipids and 182 metabolites combining the 3 different types of samples.The blood plasma analysis revealed signatures of tissue damage and acute repair response accompanied by enhanced carbon metabolism to meet energy demands.The urine analysis showed a strong,concomitant regulation of 6 out of 8 identified proteins from the renin-angiotensin system supporting increased excretion of catabolites,reabsorption of nutrients and maintenance of fluid balance.In saliva,we observed a decrease in 3 pro-inflammatory cytokines and an increase in 8 antimicrobial peptides.A systematic literature review identified 6 papers that support an altered susceptibility to respiratory infection.Conclusions:This study shows simultaneous regulatory signatures in biofluids indicative of homeostatic maintenance during intense physical activity with possible effects on increased infection susceptibility,suggesting that caution against respiratory diseases could benefit workers on highly physical demanding jobs.
基金supported by the Chinese Academy of Agricultural Sciences Innovation Project(Grant No.CAASASTIP-2013CNRRI)Fundamental Research Funds for Central Public Welfare Research Institutes of Chinese Rice Research Institute(Grant No.CPSIBRF-CNRRI-202102)。
文摘Accurate genomic information is essential for advancing genetic breeding research in specific rice varieties.This study presented a gapless genome assembly of the indica rice cultivar Zhonghui 8015(ZH8015)using Pac Bio HiFi,Hi-C,and ONT(Oxford Nanopore Technologies)ultra-long sequencing technologies,annotating 43037 gene structures.Subsequently,utilizing this genome along with transcriptomic and metabolomic techniques,we explored ZH8015's response to brown planthopper(BPH)infestation.Continuous transcriptomic sampling indicated significant changes in gene expression levels around 48 h after BPH feeding.Enrichment analysis revealed particularly significant alterations in genes related to reactive oxygen species scavenging and cell wall formation.Metabolomic results demonstrated marked increases in levels of several monosaccharides,which are components of the cell wall and dramatic changes in flavonoid contents.Omics association analysis identified differentially expressed genes associated with key metabolites,shedding light on ZH8015's response to BPH infestation.In summary,this study constructed a reliable genome sequence resource for ZH8015,and the preliminary multi-omics results will guide future insect-resistant breeding research.
基金supported by grants from the European Union under the Horizon 2020 programme(MultipleMS grant agreement 733161)to NKfrom the Spanish Government,through project PID2019-111192GA-I00(MICINN)to DGC.
文摘Background:Single-cell multi-omics technologies allow a profound system-level biology understanding of cells and tissues.However,an integrative and possibly systems-based analysis capturing the different modalities is challenging.In response,bioinformatics and machine learning methodologies are being developed for multi-omics single-cell analysis.It is unclear whether current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter fine-tuning.Methods:We designed LIBRA,a neural network based framework,to learn translation between paired multi-omics profiles so that a shared latent space is constructed.Additionally,we implemented a variation,aLIBRA,that allows automatic fine-tuning by identifying parameter combinations that optimize both the integrative and predictive tasks.All model parameters and evaluation metrics are made available to users with minimal user iteration.Furthermore,aLIBRA allows experienced users to implement custom configurations.The LIBRA toolbox is freely available as R and Python libraries at GitHub(TranslationalBioinformaticsUnit/LIBRA).Results:LIBRA was evaluated in eight multi-omic single-cell data-sets,including three combinations of omics.We observed that LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type(clustering)resolution in the integrated latent space.Furthermore,when assessing the predictive power across data modalities,such as predictive chromatin accessibility from gene expression,LIBRA outperforms existing tools.As expected,adaptive parameter optimization(aLIBRA)significantly boosted the performance of learning predictive models from paired data-sets.Conclusion:LIBRA is a versatile tool that performs competitively in both“integration”and“prediction”tasks based on single-cell multi-omics data.LIBRA is a data-driven robust platform that includes an adaptive learning scheme.
基金supported by the National Natural Science Foundation of China(31772556)the Local Innovative and Research Teams Project of Guangdong Province(2019BT02N630)+1 种基金the grants from the earmarked fund for China Agriculture Research System(CARS-35)the Science and Technology Innovation Strategy projects of Guangdong Province(Grant No.2018B020203002).
文摘Background:Presently,multi-omics data(e.g.,genomics,transcriptomics,proteomics,and metabolomics)are available to improve genomic predictors.Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level.Therefore,using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction.In this study,simultaneously using whole-genome sequencing(WGS)and gene expression level data,four strategies for single-nucleotide polymorphism(SNP)preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.Results:Using genomic best linear unbiased prediction(GBLUP)with complete WGS data,the prediction accuracies were 0.208±0.020(0.181±0.022)for the startle response and 0.272±0.017(0.307±0.015)for starvation resistance in the female(male)lines.Compared with GBLUP using complete WGS data,both GBLUP and the genomic feature BLUP(GFBLUP)did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies(GWASs)or transcriptome-wide association studies(TWASs).Furthermore,by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus(eQTL)mapping of all genes,only the startle response had greater accuracy than GBLUP with the complete WGS data.The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022,respectively.Importantly,by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS,both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction.Compared with the GBLUP using complete WGS data,the best accuracy values represented increases of 60.66%and 39.09%for the starvation resistance and 27.40%and 35.36%for startle response in the female and male lines,respectively.Conclusions:Overall,multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction.The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.
基金This work was supported by the National Key Research and Development Program of China(2019YFD1002701-02)the National Natural Science Foundation of China(32170371)the Strategic Priority Research Program of Chinese Academy of Sciences(XDA24030503).
文摘Natural rubber(NR)is an irreplaceable biopolymer of economic and strategic importance owing to its unique physical and chemical properties.The Parárubber tree(Hevea brasiliensis(Willd.ex A.Juss.)Müll.Arg.)is currently the exclusive commercial source of NR,and it is primarily grown in plantations restricted to the tropical and subtropical areas of Southeast Asia.However,current Parárubber production barely meets the sharply increasing global industrial demand for rubber.Petroleum-based synthetic rubber(SR)has been used to supplement the shortage of NR but its industrial performance is not comparable to that of NR.Thus,there is an urgent need to develop new productive rubber crops with broader environmental adaptability.This review summarizes the current research progress on alternative rubberproducing plants,including horticultural plants(Taraxacum kok-saghyz Rodin and Lactuca L.species),woody plants(Parthenium argentatum A.Gray and Eucommia ulmoides Oliv.),and other plant species with potential for NR production.With an emphasis on the molecular basis of NR biosynthesis revealed by a multi-omics approach,we highlight new integrative strategies and biotechnologies for exploring the mechanism of NR biosynthesis with a broader scope,which may accelerate the breeding and improvement of new rubber crops.
基金supported by the National Natural Science Foundation of China(Grant Nos.:U21A20418,82003727,82273903)l Zhejiang Provincial Natural Science Foundation,China(Grant No.:LQ21H310002).
文摘Recent studies have highlighted spatially resolved multi-omics technologies,including spatial genomics,transcriptomics,proteomics,and metabolomics,as powerful tools to decipher the spatial heterogeneity of the brain.Here,we focus on two major approaches in spatial transcriptomics(next-generation sequencing-based technologies and image-based technologies),and mass spectrometry imaging technologies used in spatial proteomics and spatial metabolomics.Furthermore,we discuss their applications in neuroscience,including building the brain atlas,uncovering gene expression patterns of neurons for special behaviors,deciphering the molecular basis of neuronal communication,and providing a more comprehensive explanation of the molecular mechanisms underlying central nervous system disorders.However,further efforts are still needed toward the integrative application of multi-omics technologies,including the real-time spatial multi-omics analysis in living cells,the detailed gene profile in a whole-brain view,and the combination of functional verification.
基金supported by China’s National Natural Science Foundation(Nos.62072249,62072056)This work is also funded by the National Science Foundation of Hunan Province(2020JJ2029).
文摘With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.