摘要
针对使用大规模组蛋白修饰(HM)数据预测基因差异性表达(DGE)时未合理利用细胞型特异性(CS)和细胞型间异同两类信息,且输入规模大、计算量高等问题,提出一种深度学习方法dcsDiff。首先,使用多个自编码器(AE)和双向长短时记忆(Bi‑LSTM)网络降维,并建模HM信号得到嵌入表示;然后,利用多个卷积神经网络(CNN)分别挖掘每类CS的HM组合效应以及两细胞型间每种HM的异同信息和所有HM的联合影响;最后,融合两类信息预测两细胞型间的DGE。在对REMC数据库中10对细胞型的实验中,与DeepDiff相比,dcsDiff的预测DGE的皮尔逊相关系数(PCC)最高提升了7.2%、平均提升了3.9%,准确检测出差异表达基因的数量最多增加了36、平均增加了17.6,运行时间节省了78.7%;进一步的成分分析实验证明了合理整合上述两类信息的有效性;并通过实验确定了算法的参数。实验结果表明dcsDiff能有效提高DGE预测的效率。
Concering the problem that the Cell type‑Specificity(CS)and similarity and difference information between different cell types are not properly used when predicting Differential Gene Expression(DGE)with large‑scale Histone Modification(HM)data,as well as large volume of input and high computational cost,a deep learning‑based method named dcsDiff was proposed.Firstly,multiple AutoEncoders(AEs)and Bi‑directional Long Short‑Term Memory(Bi‑LSTM)networks were introduced to reduce the dimensionality of HM signals and model them to obtain the embedded representation.Then,multiple Convolutional Neural Networks(CNNs)were used to mine the HM combined effects in each single cell type,and the similarity and difference information of each HM and joint effects of all HMs between two cell types.Finally,the two kinds of information were fused to predict DGE between two cell types.In the comparison experiments with DeepDiff on 10 pairs of cell types in the REMC(Roadmap Epigenomics Mapping Consortium)database,the Pearson Correlation Coefficient(PCC)of dcsDiff in DGE prediction was increased by 7.2%at the highest and 3.9%on average,the number of differentially expressed genes accurately detected by dcsDiff was increased by 36 at most and 17.6 on average,and the running time of dcsDiff was saved by 78.7%.The validity of reasonable integration of the above two kinds of information was proved in the component analysis experiment.The parameters of dcsDiff were also determined by experiments.Experimental results show that the proposed dcsDiff can effectively improve the efficiency of DGE prediction.
作者
李昕
贾韬
LI Xin;JIA Tao(College of Computer and Information Science,Southwest University,Chongqing 400715,China)
出处
《计算机应用》
CSCD
北大核心
2022年第11期3404-3412,共9页
journal of Computer Applications
基金
教育部中国高校产学研创新基金资助项目(2021ALA03016)。
关键词
组蛋白修饰
基因差异性表达
细胞型特异性
自编码器
双向长短时记忆网络
信息融合
表观遗传学
Histone Modification(HM)
Differential Gene Expression(DGE)
Cell type‑Specificity(CS)
AutoEncoder(AE)
Bi‑directional Long Short‑Term Memory(Bi‑LSTM)network
information fusion
epigenetics