摘要
Single-cell RNA sequencing(scRNA-seq)has been a powerful tool for biomedical research and the number of scRNA-seq datasets has been growing rapidly thanks to the continuous advancement of library preparation technologies.In addition to the increasing number of cells being profiled,there is a trend of conducting crosstissue analyses which build on the initial efforts that studied one tissue at a time.The Human Cell Atlas[1]and Human BioMolecular Atlas Program[2]aim to systematically characterize the expression profiles of various human organs and cell types,and to form comprehensive references for single-cell transcriptome data.However,due to the different data sources,how to effectively and consistently integrate these single-cell transcriptome datasets have remained as a great challenge.Specifically,there are three issues that need to be addressed.First,traditional relational databases cannot meet the requirements of efficient storage and retrieval of scRNA-seq dataset.Second,novel indexing methods are required so that each single cell could be traced with multiple attributes.Finally,standardized controlled vocabulary is required to annotate cell types in a unified manner.