摘要
针对单细胞RNA测序数据的高维性和数据中存在大量丢失噪声的问题,将降噪、降维方法融合到聚类任务中,提出了基于图卷积神经网络的聚类模型——DGGAE.该模型使用零膨胀负二项分布的负对数的似然函数作为降噪自编码器的损失函数处理数据中的丢失噪声;利用图卷积自编码器获取数据的低维特征;利用KL散度函数作为聚类的损失函数进行深度嵌入聚类.在9个真实的高维度、高噪声的数据集上的实验结果表明,与其它传统聚类方法相比,DGGAE模型有更好的聚类效果.
The thousands of gene types in a single cell have caused a dimensional disaster in RNA sequencing data,and low RNA capture rates have led to failed detection of expressed genes,resulting in a large number of false zero count observations in the sequencing data,resulting in high sparsity of the data,which is defined as a“loss event”.This article focuses on the high-dimensional nature of single-cell RNA sequencing data and the problem of a large amount of lost noise in the data.By integrating denoising and dimensionality reduction methods into clustering tasks,a clustering model based on graph convolutional neural network-DGGAE is proposed.This model uses the likelihood function of the negative logarithm of the zero expansion negative binomial distribution as the loss function of the denoising autoencoder to handle the loss noise in the data;utilies graph convolutional autoencoder to obtain low dimensional features of data;and applies KL divergence function as the loss function of clustering for deep embedding clustering.The experimental results on 9 real high-dimensional and high noise datasets show that the DGGAE model has better clustering performance compared to other traditional clustering methods.
作者
孔晨曦
鲁大营
KONG Chenxi;LU Daying(School of Cyber Science and Engineering,Qufu Normal University,273165,Qufu,Shandong,PRC)
出处
《曲阜师范大学学报(自然科学版)》
CAS
2024年第4期83-89,共7页
Journal of Qufu Normal University(Natural Science)
基金
山东省高等学校科技计划(J17KA062)
教育部产学合作协同育人项目(201602028014)
山东省研究生教育质量提升计划(SDYKC19183).
关键词
图卷积神经网络
降维
降噪
聚类
自编码器
graph convolution neural network
dimensionality reduction
noise reduction
clustering
autoencoder