Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the ...Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the 2D visualization of cells,a new feature selection method called HRG(Highly Regional Genes)is proposed to find the informative genes,which show regional expression patterns in the cell-cell similarity network.We mathematically find the optimal expression patterns that can maximize the proposed scoring function.In comparison with several unsupervised methods,HRG shows high accuracy and robustness,and can increase the performance of downstream cell clustering and gene correlation analysis.Also,it is applicable for selecting informative genes of sequencing-based spatial transcriptomic data.展开更多
基金supported by the National Key Research and Development Program(2020YFA0712403,2020YFA0906900)National Natural Science Foundation of China(61922047,81890993,61721003,62133006)BNRIST Young Innovation Fund(BNR2020RC01009)。
文摘Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the 2D visualization of cells,a new feature selection method called HRG(Highly Regional Genes)is proposed to find the informative genes,which show regional expression patterns in the cell-cell similarity network.We mathematically find the optimal expression patterns that can maximize the proposed scoring function.In comparison with several unsupervised methods,HRG shows high accuracy and robustness,and can increase the performance of downstream cell clustering and gene correlation analysis.Also,it is applicable for selecting informative genes of sequencing-based spatial transcriptomic data.