Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null...Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null distribution of the maxmean statistics via a restandardization procedure. In practice, the pre-determined gene sets have stronger intra-correlation than genes across sets. This may result in biases in the estimated null distribution. We derive an asymptotic null distribution of the maxmean statistics based on sparsity assumption. We propose a flexible two group mixture model for the maxmean statistics. The mixture model allows us to estimate the null parameters empirically via maximum likelihood approach. Our empirical method is compared with the restandardization procedure of GSA in simulations. We show that our method is more accurate in null density estimation when the genes are strongly correlated within gene sets.展开更多
Objective:Based on bioinformatics,gene set enrichment analysis(GSEA)and immune infiltration analysis were carried out on the microarray data of psoriasis expression profile to further understand the pathogenesis of ps...Objective:Based on bioinformatics,gene set enrichment analysis(GSEA)and immune infiltration analysis were carried out on the microarray data of psoriasis expression profile to further understand the pathogenesis of psoriasis.Methods:GSE6710 chip data were obtained from gene expression database(GEO),and gene ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)enrichment analysis were performed using GSEA software.22 kinds of immune cell gene expression matrices and R packages were downloaded from CIBERSOFT official website,and the immune cell infiltration matrix was obtained by R software and related graphs were drawn.Results:The pathways related to cell proliferation and innate immunity were highly expressed in psoriatic lesions,and some cancer-related pathways were highly expressed in psoriatic lesions.Immunized cell infiltration analysis showed that activated memory T cells,follicular helper T cells,M0 macrophages and activated dendritic cells were up-regulated in psoriatic skin lesion group,and inactive mast cells were down-regulated in psoriatic skin lesion group.Activated dendritic cells are positively correlated with follicular helper T cells,activated mast cells are positively correlated with M0 macrophages.Inactivated mast cells are negatively correlated with activated memory T cells,M1 macrophages are negatively correlated with regulatory T cells,M0 macrophages are negatively correlated with inactive mast cells.Conclusion:Cell proliferation and innate immunity are of great significance in the pathogenesis of psoriasis.Immune cell infiltration analysis is generally consistent with the current psoriasis pathogenesis model.Macrophages and mast cells also play a certain role in psoriasis.展开更多
In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same si...In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.展开更多
The heterogeneity of traumatic brain injury(TBI)-induced secondary injury has greatly hampered the development of effective treatments for TBI patients.Targeting common processes across species may be an innovative st...The heterogeneity of traumatic brain injury(TBI)-induced secondary injury has greatly hampered the development of effective treatments for TBI patients.Targeting common processes across species may be an innovative strategy to combat debilitating TBI.In the present study, a cross-species transcriptome comparison was performed for the first time to determine the fundamental processes of secondary brain injury in Sprague-Dawley rat and C57/BL6 mouse models of TBI, caused by acute controlled cortical impact.The RNA sequencing data from the mouse model of TBI were downloaded from the Gene Expression Omnibus(ID: GSE79441) at the National Center for Biotechnology Information.For the rat data, peri-injury cerebral cortex samples were collected for transcriptomic analysis 24 hours after TBI.Differentially expressed gene-based functional analysis revealed that common features between the two species were mainly involved in the regulation and activation of the innate immune response, including complement cascades as well as Toll-like and nucleotide oligomerization domain-like receptor pathways.These findings were further corroborated by gene set enrichment analysis.Moreover, transcription factor analysis revealed that the families of signal transducers and activators of transcription(STAT), basic leucine zipper(BZIP), Rel homology domain(RHD), and interferon regulatory factor(IRF) transcription factors play vital regulatory roles in the pathophysiological processes of TBI, and are also largely associated with inflammation.These findings suggest that targeting the common innate immune response might be a promising therapeutic approach for TBI.The animal experimental procedures were approved by the Beijing Neurosurgical Institute Animal Care and Use Committee(approval No.201802001) on June 6, 2018.展开更多
Anaplastic thyroid carcinoma(ATC)is a rare but extremely lethal malignancy.However,little is known about the pathogenesis of ATC.Given its high mortality,it is critical to improve our understanding of ATC pathogenesis...Anaplastic thyroid carcinoma(ATC)is a rare but extremely lethal malignancy.However,little is known about the pathogenesis of ATC.Given its high mortality,it is critical to improve our understanding of ATC pathogenesis and to find new diagnostic biomarkers.In the present study,two gene microarray profiles(GSE53072 and GSE65144),which included 17 ATC and 17 adjacent non-tumorous tissues,were obtained.Bioinformatic analyses were then performed.Immunohistochemistry(IHC)and receiver operating characteristic(ROC)curves were then used to detect transmembrane protein 158(TMEM158)expression and to assess diagnostic sensitivity.A total of 372 differentially expressed genes(DEGs)were identified.Through protein-protein interaction(PPI)analysis,we identified a significant module with 37 upregulated genes.Most of the genes in this module were related to cell-cycle processes.After co-expression analysis,132 hub genes were selected for further study.Nine genes were identified as both DEGs and genes of interest in the weighted gene co-expression network analysis(WGCNA).IHC and ROC curves confirmed that TMEM158 was overexpressed in ATC tissue as compared with other types of thyroid cancer and normal tissue samples.We identified 8 KEGG pathways that were associated with high expression of TMEM158,including aminoacyl-tRNA biosynthesis and DNA replication.Our results suggest that TMEM158 may be a potential oncogene and serve as a diagnostic indicator for ATC.展开更多
Objective: Glioblastoma(GBM) is the most common primary malignant brain tumor regulated by numerous genes, with poor survival outcomes and unsatisfactory response to therapy.Therefore, a robust, multi-gene signature-d...Objective: Glioblastoma(GBM) is the most common primary malignant brain tumor regulated by numerous genes, with poor survival outcomes and unsatisfactory response to therapy.Therefore, a robust, multi-gene signature-derived model is required to predict the prognosis and treatment response in GBM.Methods: Gene expression data of GBM from TCGA and GEO datasets were used to identify differentially expressed genes(DEGs)through DESeq2 or LIMMA methods.The DEGs were then overlapped and used for survival analysis by univariate and multivariate COX regression.Based on the gene signature of multiple survival-associated DEGs, a risk score model was established,and its prognostic and predictive role was estimated through Kaplan–Meier analysis and log-rank test.Gene set enrichment analysis(GSEA) was conducted to explore high-risk score-associated pathways.Western blot was used for protein detection.Results: Four survival-associated DEGs of GBM were identified: OSMR, HOXC10, SCARA3, and SLC39A10.The four-gene signature-derived risk score was higher in GBM than in normal brain tissues.GBM patients with a high-risk score had poor survival outcomes.The high-risk group treated with temozolomide chemotherapy or radiotherapy survived for a shorter duration than the low-risk group.GSEA showed that the high-risk score was enriched with pathways such as vasculature development and cell adhesion.Western blot confirmed that the proteins of these four genes were differentially expressed in GBM cells.Conclusions: The four-gene signature-derived risk score functions well in predicting the prognosis and treatment response in GBM and will be useful for guiding therapeutic strategies for GBM patients.展开更多
目的旨在探讨着丝粒蛋白U(centromere protein U,CENPU)在结直肠癌患者肠组织中的表达情况,并结合生物信息学分析其表达水平对结直肠癌患者预后的影响。方法通过实时荧光定量聚合酶链反应(quantitative real time polymerase chain reac...目的旨在探讨着丝粒蛋白U(centromere protein U,CENPU)在结直肠癌患者肠组织中的表达情况,并结合生物信息学分析其表达水平对结直肠癌患者预后的影响。方法通过实时荧光定量聚合酶链反应(quantitative real time polymerase chain reaction,qRT-PCR)、蛋白质免疫印迹(Western blot,WB)法以及免疫组织化学染色(immunohistochemistry,IHC)实验验证CENPU在组织中的表达情况。结合患者临床病例资料,通过单因素和多因素Cox回归分析CENPU的表达与结直肠癌患者临床病例参数的相关性;然后通过绘制受试操作者操作特征(receiver operating characteristic,ROC)曲线和Kaplan-Meier生存曲线,探究CENPU的表达对结直肠癌患者预后的预测作用。最后,通过生物信息学分析CENPU的表达对结直肠癌疾病进展影响的可能分子机制。结果通过qRT-PCR、WB法以及IHC实验均发现,与正常组织比较,CENPU在结直肠癌患者癌组织中表达显著升高。Cox回归分析表明CENPU的表达与患者的年龄和TNM分期显著相关,是影响患者预后的危险因素。Kaplan-Meier生存曲线分析表明:CENPU高表达的结直肠癌患者的生存率显著降低。ROC曲线结果表明:基于CENPU的表达建立的模型具有较高的预测结直肠癌患者预后的能力。生物信息学分析结果表明:CENPI、CENPN、CENPD、CENPK、CENPP、CENPM、CENPQ、CENPH、NDC80以及ITGB3BP这10个基因与CENPU基因具有相互作用关系;CENPU参与DNA修复、MYC/TARGETS/V1以及PI3K/AKT/MTOR等信号通路。结论结直肠癌患者癌组织中高表达的CENPU与患者的不良预后显著相关,提示CENPU有望成为结直肠癌患者早期诊断及预测预后的潜在靶点。展开更多
目的从通路水平探究慢性阻塞性肺疾病有效方药补肺益肾方的干预机制。方法采用LPS诱导巨噬细胞建立炎症反应模型。基于基因集富集分析(Gene set enrichment analysis,GSEA)方法,筛选巨噬细胞炎症反应相关通路,通过富集评分(Normalized e...目的从通路水平探究慢性阻塞性肺疾病有效方药补肺益肾方的干预机制。方法采用LPS诱导巨噬细胞建立炎症反应模型。基于基因集富集分析(Gene set enrichment analysis,GSEA)方法,筛选巨噬细胞炎症反应相关通路,通过富集评分(Normalized enrichment score,NES)筛选药物干预后发生逆转的通路,揭示补肺益肾方及其配伍的干预机制。结果补肺益肾方所含中药的NES为-1377.23,其中补肾配伍的为-485.07、活血配伍的为-351.86、化痰配伍的为-303.71、益气配伍的为-236.59;补肺益肾方显著逆转的通路为213条,其中活血配伍的为184条、补肾配伍的为147条、化痰配伍的为134条、益气配伍的为133条,逆转率分别为75.41%、60.25%、54.92%、54.51%。TGF-βproduction等90条通路在4个配伍中均被显著逆转。Positive regulation of cytokine production involved in inflammatory response等为配伍特异性逆转通路。结论补肺益肾方各配伍组逆转炎症信号通路的强度依次为补肾、活血、化痰、益气配伍,逆转通路数量依次为活血、补肾、化痰、益气。补肺益肾方可通过调控各配伍共性及特异性逆转通路干预炎症反应。展开更多
Gene set scoring(GSS)has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing(RNA-seq)data,which helps to decipher single-cell heterogeneity and cell type-specific variability by...Gene set scoring(GSS)has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing(RNA-seq)data,which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets.Single-cell assay for transposase accessible chromatin using sequencing(scATAC-seq)is a powerful technique for interrogating single-cell chromatin-based gene regulation,and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq(scRNA-seq).However,there are few GSS tools specifically designed for scATAC-seq,and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated.Here,we systematically benchmarked ten GSS tools,including four bulk RNA-seq tools,five scRNA-seq tools,and one scATAC-seq method.First,using matched scATAC-seq and scRNA-seq datasets,we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq,suggesting their applicability to scATAC-seq.Then,the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets.Moreover,we evaluated the impact of gene activity conversion,dropout imputation,and gene set collections on the results of GSS.Results show that dropout imputation can significantly promote the performance of almost all GSS tools,while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets.Finally,we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.展开更多
文摘Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null distribution of the maxmean statistics via a restandardization procedure. In practice, the pre-determined gene sets have stronger intra-correlation than genes across sets. This may result in biases in the estimated null distribution. We derive an asymptotic null distribution of the maxmean statistics based on sparsity assumption. We propose a flexible two group mixture model for the maxmean statistics. The mixture model allows us to estimate the null parameters empirically via maximum likelihood approach. Our empirical method is compared with the restandardization procedure of GSA in simulations. We show that our method is more accurate in null density estimation when the genes are strongly correlated within gene sets.
基金Beijing Key Laboratory of Clinical Basic Research on Psoriasis of Traditional Chinese Medicine(No.BZ0375-KF201602)。
文摘Objective:Based on bioinformatics,gene set enrichment analysis(GSEA)and immune infiltration analysis were carried out on the microarray data of psoriasis expression profile to further understand the pathogenesis of psoriasis.Methods:GSE6710 chip data were obtained from gene expression database(GEO),and gene ontology(GO)and Kyoto Encyclopedia of Genes and Genomes(KEGG)enrichment analysis were performed using GSEA software.22 kinds of immune cell gene expression matrices and R packages were downloaded from CIBERSOFT official website,and the immune cell infiltration matrix was obtained by R software and related graphs were drawn.Results:The pathways related to cell proliferation and innate immunity were highly expressed in psoriatic lesions,and some cancer-related pathways were highly expressed in psoriatic lesions.Immunized cell infiltration analysis showed that activated memory T cells,follicular helper T cells,M0 macrophages and activated dendritic cells were up-regulated in psoriatic skin lesion group,and inactive mast cells were down-regulated in psoriatic skin lesion group.Activated dendritic cells are positively correlated with follicular helper T cells,activated mast cells are positively correlated with M0 macrophages.Inactivated mast cells are negatively correlated with activated memory T cells,M1 macrophages are negatively correlated with regulatory T cells,M0 macrophages are negatively correlated with inactive mast cells.Conclusion:Cell proliferation and innate immunity are of great significance in the pathogenesis of psoriasis.Immune cell infiltration analysis is generally consistent with the current psoriasis pathogenesis model.Macrophages and mast cells also play a certain role in psoriasis.
文摘In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.
基金supported by the National Natural Science Foundation of China, Nos.81471238, 81771327(both to BYL)Construction of Central Nervous System Injury Basic Science and Clinical Translational Research Platform, Budget of Beijing Municipal Health Commission 2020, No.PXM2020_026280_000002(to BYL)。
文摘The heterogeneity of traumatic brain injury(TBI)-induced secondary injury has greatly hampered the development of effective treatments for TBI patients.Targeting common processes across species may be an innovative strategy to combat debilitating TBI.In the present study, a cross-species transcriptome comparison was performed for the first time to determine the fundamental processes of secondary brain injury in Sprague-Dawley rat and C57/BL6 mouse models of TBI, caused by acute controlled cortical impact.The RNA sequencing data from the mouse model of TBI were downloaded from the Gene Expression Omnibus(ID: GSE79441) at the National Center for Biotechnology Information.For the rat data, peri-injury cerebral cortex samples were collected for transcriptomic analysis 24 hours after TBI.Differentially expressed gene-based functional analysis revealed that common features between the two species were mainly involved in the regulation and activation of the innate immune response, including complement cascades as well as Toll-like and nucleotide oligomerization domain-like receptor pathways.These findings were further corroborated by gene set enrichment analysis.Moreover, transcription factor analysis revealed that the families of signal transducers and activators of transcription(STAT), basic leucine zipper(BZIP), Rel homology domain(RHD), and interferon regulatory factor(IRF) transcription factors play vital regulatory roles in the pathophysiological processes of TBI, and are also largely associated with inflammation.These findings suggest that targeting the common innate immune response might be a promising therapeutic approach for TBI.The animal experimental procedures were approved by the Beijing Neurosurgical Institute Animal Care and Use Committee(approval No.201802001) on June 6, 2018.
基金This study was supported by grants from Tongji Medical College,Huazhong University of Science and Technology(CN)(No.5001540018)Young Scientists Fund(No.81802676).
文摘Anaplastic thyroid carcinoma(ATC)is a rare but extremely lethal malignancy.However,little is known about the pathogenesis of ATC.Given its high mortality,it is critical to improve our understanding of ATC pathogenesis and to find new diagnostic biomarkers.In the present study,two gene microarray profiles(GSE53072 and GSE65144),which included 17 ATC and 17 adjacent non-tumorous tissues,were obtained.Bioinformatic analyses were then performed.Immunohistochemistry(IHC)and receiver operating characteristic(ROC)curves were then used to detect transmembrane protein 158(TMEM158)expression and to assess diagnostic sensitivity.A total of 372 differentially expressed genes(DEGs)were identified.Through protein-protein interaction(PPI)analysis,we identified a significant module with 37 upregulated genes.Most of the genes in this module were related to cell-cycle processes.After co-expression analysis,132 hub genes were selected for further study.Nine genes were identified as both DEGs and genes of interest in the weighted gene co-expression network analysis(WGCNA).IHC and ROC curves confirmed that TMEM158 was overexpressed in ATC tissue as compared with other types of thyroid cancer and normal tissue samples.We identified 8 KEGG pathways that were associated with high expression of TMEM158,including aminoacyl-tRNA biosynthesis and DNA replication.Our results suggest that TMEM158 may be a potential oncogene and serve as a diagnostic indicator for ATC.
基金supported by the National Key R&D Program of China (Grant No.2016YFA0101203 to XB and 2016YFC1201801 to XZ)the National Natural Science Foundation of China (Grant No.81372273 and 81773145 to XZ)+1 种基金the funding from Key Laboratory of Tumor Immunology and Pathology (Army Medical University), Ministry of Education of China (Grant No.2017jszl09 to MC)the Basic and Applied Fund of First Affiliated Hospital of Army Military Medical University (Grant No.SWH2016BZGFSBJ-04 and SWH2016JCZD-04 to XZ)
文摘Objective: Glioblastoma(GBM) is the most common primary malignant brain tumor regulated by numerous genes, with poor survival outcomes and unsatisfactory response to therapy.Therefore, a robust, multi-gene signature-derived model is required to predict the prognosis and treatment response in GBM.Methods: Gene expression data of GBM from TCGA and GEO datasets were used to identify differentially expressed genes(DEGs)through DESeq2 or LIMMA methods.The DEGs were then overlapped and used for survival analysis by univariate and multivariate COX regression.Based on the gene signature of multiple survival-associated DEGs, a risk score model was established,and its prognostic and predictive role was estimated through Kaplan–Meier analysis and log-rank test.Gene set enrichment analysis(GSEA) was conducted to explore high-risk score-associated pathways.Western blot was used for protein detection.Results: Four survival-associated DEGs of GBM were identified: OSMR, HOXC10, SCARA3, and SLC39A10.The four-gene signature-derived risk score was higher in GBM than in normal brain tissues.GBM patients with a high-risk score had poor survival outcomes.The high-risk group treated with temozolomide chemotherapy or radiotherapy survived for a shorter duration than the low-risk group.GSEA showed that the high-risk score was enriched with pathways such as vasculature development and cell adhesion.Western blot confirmed that the proteins of these four genes were differentially expressed in GBM cells.Conclusions: The four-gene signature-derived risk score functions well in predicting the prognosis and treatment response in GBM and will be useful for guiding therapeutic strategies for GBM patients.
文摘目的旨在探讨着丝粒蛋白U(centromere protein U,CENPU)在结直肠癌患者肠组织中的表达情况,并结合生物信息学分析其表达水平对结直肠癌患者预后的影响。方法通过实时荧光定量聚合酶链反应(quantitative real time polymerase chain reaction,qRT-PCR)、蛋白质免疫印迹(Western blot,WB)法以及免疫组织化学染色(immunohistochemistry,IHC)实验验证CENPU在组织中的表达情况。结合患者临床病例资料,通过单因素和多因素Cox回归分析CENPU的表达与结直肠癌患者临床病例参数的相关性;然后通过绘制受试操作者操作特征(receiver operating characteristic,ROC)曲线和Kaplan-Meier生存曲线,探究CENPU的表达对结直肠癌患者预后的预测作用。最后,通过生物信息学分析CENPU的表达对结直肠癌疾病进展影响的可能分子机制。结果通过qRT-PCR、WB法以及IHC实验均发现,与正常组织比较,CENPU在结直肠癌患者癌组织中表达显著升高。Cox回归分析表明CENPU的表达与患者的年龄和TNM分期显著相关,是影响患者预后的危险因素。Kaplan-Meier生存曲线分析表明:CENPU高表达的结直肠癌患者的生存率显著降低。ROC曲线结果表明:基于CENPU的表达建立的模型具有较高的预测结直肠癌患者预后的能力。生物信息学分析结果表明:CENPI、CENPN、CENPD、CENPK、CENPP、CENPM、CENPQ、CENPH、NDC80以及ITGB3BP这10个基因与CENPU基因具有相互作用关系;CENPU参与DNA修复、MYC/TARGETS/V1以及PI3K/AKT/MTOR等信号通路。结论结直肠癌患者癌组织中高表达的CENPU与患者的不良预后显著相关,提示CENPU有望成为结直肠癌患者早期诊断及预测预后的潜在靶点。
文摘目的从通路水平探究慢性阻塞性肺疾病有效方药补肺益肾方的干预机制。方法采用LPS诱导巨噬细胞建立炎症反应模型。基于基因集富集分析(Gene set enrichment analysis,GSEA)方法,筛选巨噬细胞炎症反应相关通路,通过富集评分(Normalized enrichment score,NES)筛选药物干预后发生逆转的通路,揭示补肺益肾方及其配伍的干预机制。结果补肺益肾方所含中药的NES为-1377.23,其中补肾配伍的为-485.07、活血配伍的为-351.86、化痰配伍的为-303.71、益气配伍的为-236.59;补肺益肾方显著逆转的通路为213条,其中活血配伍的为184条、补肾配伍的为147条、化痰配伍的为134条、益气配伍的为133条,逆转率分别为75.41%、60.25%、54.92%、54.51%。TGF-βproduction等90条通路在4个配伍中均被显著逆转。Positive regulation of cytokine production involved in inflammatory response等为配伍特异性逆转通路。结论补肺益肾方各配伍组逆转炎症信号通路的强度依次为补肾、活血、化痰、益气配伍,逆转通路数量依次为活血、补肾、化痰、益气。补肺益肾方可通过调控各配伍共性及特异性逆转通路干预炎症反应。
基金supported by the National Natural Science Foundation of China(Grant No.T2222007 to Xiaohui Wu).
文摘Gene set scoring(GSS)has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing(RNA-seq)data,which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets.Single-cell assay for transposase accessible chromatin using sequencing(scATAC-seq)is a powerful technique for interrogating single-cell chromatin-based gene regulation,and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq(scRNA-seq).However,there are few GSS tools specifically designed for scATAC-seq,and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated.Here,we systematically benchmarked ten GSS tools,including four bulk RNA-seq tools,five scRNA-seq tools,and one scATAC-seq method.First,using matched scATAC-seq and scRNA-seq datasets,we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq,suggesting their applicability to scATAC-seq.Then,the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets.Moreover,we evaluated the impact of gene activity conversion,dropout imputation,and gene set collections on the results of GSS.Results show that dropout imputation can significantly promote the performance of almost all GSS tools,while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets.Finally,we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.