Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell,which can reflect the heterogeneity between cells.However,batch effects caused by non-biological factors may hinder dat...Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell,which can reflect the heterogeneity between cells.However,batch effects caused by non-biological factors may hinder data integration and downstream analysis.Although the batch effect can be evaluated by visualizing the data,which actually is subjective and inaccurate.In this work,we propose a quantitative method cKBET,which considers the batch and cell type information simultaneously.The cKBET method accesses batch effects by comparing the global and local fraction of cells of different batches in different cell types.We verify the performance of our cKBET method on simulated and real biological data sets.The experimental results show that our cKBET method is superior to existing methods in most cases.In general,our cKBET method can detect batch effect with either balanced or unbalanced cell types,and thus evaluate batch correction methods.展开更多
Single-cell RNA sequencing(scRNA-seq)allows researchers to examine the transcriptome at the single-cell level and has been increasingly employed as technologies continue to advance.Due to technical and biological reas...Single-cell RNA sequencing(scRNA-seq)allows researchers to examine the transcriptome at the single-cell level and has been increasingly employed as technologies continue to advance.Due to technical and biological reasons unique to scRNA-seq data,denoising and batch effect correction are almost indispensable to ensure valid and powerful data analysis.However,various aspects of scRNA-seq data pose grand challenges for such essential tasks pertaining to data pre-processing,normalization or harmonization.In this review,we first discuss properties of scRNA-seq data that contribute to the challenges for denoising and batch effect correction from a computational perspective.We then focus on reviewing several state-of-the-art methods for dropout imputation and batch effect correction,comparing their strengths and weaknesses.Finally,we benchmarked three widely used correction tools using two hematopoietic scRNA-seq datasets to show their performance in a real data application.展开更多
Batch effects are technical sources of variation and can confound analysis.While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm(BECA),we hold the viewpoi...Batch effects are technical sources of variation and can confound analysis.While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm(BECA),we hold the viewpoint that the notion of best is context-dependent.Moreover,alternative questions beyond the simplistic notion of "best" are also interesting:are BECAs robust against various degrees of confounding and if so,what is the limit?Using two different methods for simulating class(phenotype) and batch effects and taking various representative datasets across both genomics(RNA-Seq) and proteomics platforms,we demonstrate that under situations where sample classes and batch factors are moderately confounded,most BECAs are remarkably robust and only weakly affected by upstream normalization procedures.This observation is consistently supported across the multitude of test datasets.BECAs do have limits:When sample classes and batch factors are strongly confounded,BECA performance declines,with variable performance in precision,recall and also batch correction.We also report that while conventional normalization methods have minimal impact on batch effect correction,they do not affect downstream statistical feature selection,and in strongly confounded scenarios,may even outperform BECAs.In other words,removing batch effects is no guarantee of optimal functional analysis.Overall,this study suggests that simplistic performance ranking exercises are quite trivial,and all BECAs are compromises in some context or another.展开更多
Personalized medicine will improve heath outcomes and patient satisfaction. However, implementing personalized medicine based on individuals’ biological information is far from simple, requiring genetic biomarkers th...Personalized medicine will improve heath outcomes and patient satisfaction. However, implementing personalized medicine based on individuals’ biological information is far from simple, requiring genetic biomarkers that are mainly developed and used by the pharmaceutical companies for selecting those patients who benefit more, or have less risk of adverse drug reactions, from a particular drug. Genome-wide Association Studies (GWAS) aim to identify genetic variants across the human genome that might be utilized as genetic biomarkers for diagnosis and prognosis. During the last several years, high-density genotyping SNP arrays have facilitated GWAS that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. The replication studies demonstrated that only a small portion of associated loci in the initial GWAS can be replicated, even within the same populations. Given the complexity of GWAS, multiple sources of Type I (false positive) and Type II (false negative) errors exist. The inconsistency in genotypes that caused either by the genotypeing experiment or by genotype calling process is a major source of the false GWAS findings. Accurate and reproducible genotypes are paramount as inconsistency in genotypes can lead to an inflation of false associations. This article will review the sources of inconsistency in genotypes and discuss its effect in GWAS findings.展开更多
From sucrose aqueous solution of high viscosity, sucrose was crystallized by seeded cooling in a batch crystallizer. At low seed loadings, the product crystal size distribution (CSD) was wide-spread because of enormou...From sucrose aqueous solution of high viscosity, sucrose was crystallized by seeded cooling in a batch crystallizer. At low seed loadings, the product crystal size distribution (CSD) was wide-spread because of enormous secondary and primary nucleation. However, at high seed loadings, it became uni-modal, where the crystallization was dominated by seed growth with practically no secondary nucleation. Enough seeding was thus effective in suppressing nucleation even during batch crystallization with high viscosity solution. The volume mean size of the product crystals obtained at high seed loadings agreed with that calculated by a simple mass balance assuming growth-dominated crystallization with no change in the number of crystals.展开更多
基金supported by the NSFC projects(Grant No.11631012).
文摘Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell,which can reflect the heterogeneity between cells.However,batch effects caused by non-biological factors may hinder data integration and downstream analysis.Although the batch effect can be evaluated by visualizing the data,which actually is subjective and inaccurate.In this work,we propose a quantitative method cKBET,which considers the batch and cell type information simultaneously.The cKBET method accesses batch effects by comparing the global and local fraction of cells of different batches in different cell types.We verify the performance of our cKBET method on simulated and real biological data sets.The experimental results show that our cKBET method is superior to existing methods in most cases.In general,our cKBET method can detect batch effect with either balanced or unbalanced cell types,and thus evaluate batch correction methods.
文摘Single-cell RNA sequencing(scRNA-seq)allows researchers to examine the transcriptome at the single-cell level and has been increasingly employed as technologies continue to advance.Due to technical and biological reasons unique to scRNA-seq data,denoising and batch effect correction are almost indispensable to ensure valid and powerful data analysis.However,various aspects of scRNA-seq data pose grand challenges for such essential tasks pertaining to data pre-processing,normalization or harmonization.In this review,we first discuss properties of scRNA-seq data that contribute to the challenges for denoising and batch effect correction from a computational perspective.We then focus on reviewing several state-of-the-art methods for dropout imputation and batch effect correction,comparing their strengths and weaknesses.Finally,we benchmarked three widely used correction tools using two hematopoietic scRNA-seq datasets to show their performance in a real data application.
基金support from the National Research Foundation of SingaporeNRF-NSFC(Grant No.NRF2018NRF-NSFC003SB-006)
文摘Batch effects are technical sources of variation and can confound analysis.While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm(BECA),we hold the viewpoint that the notion of best is context-dependent.Moreover,alternative questions beyond the simplistic notion of "best" are also interesting:are BECAs robust against various degrees of confounding and if so,what is the limit?Using two different methods for simulating class(phenotype) and batch effects and taking various representative datasets across both genomics(RNA-Seq) and proteomics platforms,we demonstrate that under situations where sample classes and batch factors are moderately confounded,most BECAs are remarkably robust and only weakly affected by upstream normalization procedures.This observation is consistently supported across the multitude of test datasets.BECAs do have limits:When sample classes and batch factors are strongly confounded,BECA performance declines,with variable performance in precision,recall and also batch correction.We also report that while conventional normalization methods have minimal impact on batch effect correction,they do not affect downstream statistical feature selection,and in strongly confounded scenarios,may even outperform BECAs.In other words,removing batch effects is no guarantee of optimal functional analysis.Overall,this study suggests that simplistic performance ranking exercises are quite trivial,and all BECAs are compromises in some context or another.
文摘Personalized medicine will improve heath outcomes and patient satisfaction. However, implementing personalized medicine based on individuals’ biological information is far from simple, requiring genetic biomarkers that are mainly developed and used by the pharmaceutical companies for selecting those patients who benefit more, or have less risk of adverse drug reactions, from a particular drug. Genome-wide Association Studies (GWAS) aim to identify genetic variants across the human genome that might be utilized as genetic biomarkers for diagnosis and prognosis. During the last several years, high-density genotyping SNP arrays have facilitated GWAS that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. The replication studies demonstrated that only a small portion of associated loci in the initial GWAS can be replicated, even within the same populations. Given the complexity of GWAS, multiple sources of Type I (false positive) and Type II (false negative) errors exist. The inconsistency in genotypes that caused either by the genotypeing experiment or by genotype calling process is a major source of the false GWAS findings. Accurate and reproducible genotypes are paramount as inconsistency in genotypes can lead to an inflation of false associations. This article will review the sources of inconsistency in genotypes and discuss its effect in GWAS findings.
文摘From sucrose aqueous solution of high viscosity, sucrose was crystallized by seeded cooling in a batch crystallizer. At low seed loadings, the product crystal size distribution (CSD) was wide-spread because of enormous secondary and primary nucleation. However, at high seed loadings, it became uni-modal, where the crystallization was dominated by seed growth with practically no secondary nucleation. Enough seeding was thus effective in suppressing nucleation even during batch crystallization with high viscosity solution. The volume mean size of the product crystals obtained at high seed loadings agreed with that calculated by a simple mass balance assuming growth-dominated crystallization with no change in the number of crystals.