摘要
转录调控网络一直是系统生物学和生物信息学领域的一个研究热点。构建转录调控网络为揭示细胞内的生化反应机制提供了重要的手段。目前该领域的研究存在生物数据利用不充分,基因转录调控网络构建精度低等问题,尤其是在比较大的数据集上。针对以上问题,充分利用基因表达数据、基因序列数据和基因注释数据,提出了基于深度自编码器的XGBoost和逻辑回归组合模型DAXL(combined model with XGBoost and logistic regression based on deep Auto Encoder)。最后,在拟南芥数据集上进行了实验,结果表明DAXL方法提高了转录调控网络的预测精度,并且较对比方法优势明显。
The transcriptional regulatory network has been a hot research topic in the field of systems biology and bioinformatics.The transcriptional regulatory network provides the necessary means to reveal the mechanism of biochemical reactions within the cell.At present,the research in this field has some problems,such as inadequate utilization of biological data and low precision of gene transcriptional regulatory network,especially in larger data sets.To solve the above problems,this paper involves the gene expression data,gene sequence data and gene annotation data,and proposes DAXL(combined model with XGBoost and logistic regression based on deep autoencoder).The experimental results in Arabidopsis data set show that DAXL method improves the accuracy of predicting transcriptional regulatory network,and has obvious advantages compared with the contrast method.
作者
刘晓燕
张诚诚
郭茂祖
邢林林
LIU Xiaoyan;ZHANG Chengcheng;GUO Maozu;XING Linlin(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China;School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Bei jing 100044,China)
出处
《计算机科学与探索》
CSCD
北大核心
2018年第7期1154-1161,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Nos.61571163
61532014
61671189
91735306
国家重点研发计划课题No.2016YFC0901902~~