期刊文献+

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data 被引量:4

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
原文传递
导出
摘要 Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC),a new clustering framework based on random projection and feature construction,for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy,robustness,and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells,SSCC achieved 20%improvement for clustering accuracy and 50-fold acceleration,but only consumed 66%memory usage,compared to the widelyused software package SC3. Compared to k-means,the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust. Clustering is a prevalent analytical means to analyze single cell RNA sequencing(scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification(SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-ofthe-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.
机构地区 BIOPIC
出处 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2019年第2期201-210,共10页 基因组蛋白质组与生物信息学报(英文版)
基金 supported by grants from Beijing Advanced Innovation Center for Genomics at Peking University Key Technologies R&D Program (Grant No. 2016YFC0900100) by the Ministry of Science and Technology of China the National Natural Science Foundation of China (Grant Nos. 81573022 and 31530036)
关键词 Single cell RNA-SEQ CLUSTERING SUBSAMPLING Classification Single cell RNA-seq Clustering Subsampling Classification
分类号 Q [生物学]
  • 相关文献

同被引文献9

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部