摘要
随着高通量测序技术的发展,多组学多模态数据的整合已成为复杂疾病研究的重要趋势,为深入理解疾病的发生发展提供了新视角,为实现复杂疾病的精准诊疗提供了重要支持.本文首先介绍了复杂疾病研究中的不同组学类型,如基因组学、转录组学、蛋白质组学、代谢组学、微生物组学、影像组学等,以及相应的多组学数据库.然后本文对多组学、多模态数据的整合方法进行了系统的分类,详细阐述了基于关联分析和网络的方法,以及基于数据矩阵和机器学习的方法中早期整合、中期整合和后期整合方法.此外,本文还讨论了多组学整合模型在疾病筛查、分型、预后和药物反应预测等方面的应用.最后,本文总结了当前多组学整合面临的挑战,分为样本层面、数据层面和模型层面三类,并展望了未来的发展方向.本文为复杂疾病中多组学、多模态数据整合研究提供了系统的梳理,对该领域的进一步发展具有重要意义.
With the advancement of high-throughput sequencing technologies,the integration of multi-omics and multi-modal data has become an important trend in the study of complex diseases.Multi-omics/multi-modal data provide new perspectives for a deeper understanding of the pathogenesis and development of diseases,offering crucial support for the precision diagnosis and treatment of complex diseases.This review first introduces various types of omics data and their contributions to complex disease research.Genomics reveals the genetic background and mutations associated with diseases by analyzing gene sequences;transcriptomics uncovers gene regulatory relationships related to diseases by studying expression patterns;proteomics focuses on the expression,modification,and interactions of proteins;metabolomics reflects adjustments in metabolic pathways before and after illness through changes in metabolites;radiomics shows disease-induced alterations via medical imaging.Integrating and analyzing these omics data can compensate for the information gaps of single omics data,enabling a more comprehensive understanding of the molecular mechanisms of complex diseases.Furthermore,this review introduces multi-omics databases related to complex diseases,covering diseases such as cancer,cardiovascular and cerebrovascular diseases,organ fibrosis,chronic kidney disease,Alzheimer’s disease,and inflammatory bowel disease.These databases facilitate researchers in obtaining and analyzing multi-omics data.Next,this review systematically categorizes the existing multi-omics integration methods into two types:correlation and network-based methods and data matrix and machine learning-based methods.These two approaches use different means to reveal the potential connections between data,thereby providing deeper insights into complex biological systems.Correlation and network-based methods involve using association analysis or complex network analysis to identify the intrinsic connections between different omics,thereby discovering biomarkers related to phenotypes.Data matrix and machine learning-based methods refer to utilizing statistical analysis,machine learning,and deep learning models to achieve data fusion for clustering or classification tasks,while revealing the inherent relationships between multi-omics data and identifying disease-related biomarkers.Data matrix and machine learning-based methods are further divided into early integration,intermediate integration,and late integration.Early integration method involves merging multi-omics data into a joint matrix and then applying machine learning or deep learning models for classification.Intermediate integration method involves modeling each omics data separately,followed by the integration of the transformed matrices or models.Late integration method independently models each omics data and then combines the model output results.Building on this,the review also discusses the applications of multi-omics integration models in complex diseases,such as disease screening,subtyping,prognosis,and drug response prediction.Finally,this review summarizes the current challenges in multi-omics/multi-modal data integration,which are divided into three levels:sample,data,and model.At the sample level,the absence of matching data limits the efficacy of integration methods,and researchers are addressing this issue through the development of data sharing and new algorithms.At the data level,the characteristics of high dimensionality,noise,and heterogeneity necessitate the use of more efficient deep learning methods for data integration.At the model level,the key challenges include lack of interpretability,low computational efficiency,and privacy concerns.Researchers are enhancing model interpretability through visualization tools and incorporating biological prior knowledge into deep learning models,while also exploring new technologies such as model acceleration and privacy-preserving computation to improve model efficiency and security.Despite the challenges,multi-omics integration has demonstrated significant potential in the diagnosis and treatment of complex diseases.
作者
刘晓帆
鲁志
Xiaofan Liu;Zhi John Lu(MOE Key Laboratory of Bioinformatics,Center for Synthetic and Systems Biology,School of Life Sciences,Tsinghua University,Beijing 100084,China;Institute for Precision Medicine,Tsinghua University,Beijing 100084,China)
出处
《科学通报》
EI
CAS
CSCD
北大核心
2024年第30期4432-4446,共15页
Chinese Science Bulletin