The research hotspot in post-genomic era is from sequence to function. Building genetic regulatory network (GRN) can help to understand the regulatory mechanism between genes and the function of organisms. Probabilist...The research hotspot in post-genomic era is from sequence to function. Building genetic regulatory network (GRN) can help to understand the regulatory mechanism between genes and the function of organisms. Probabilistic GRN has been paid more attention recently. This paper discusses the Hidden Markov Model (HMM) approach served as a tool to build GRN. Different genes with similar expression levels are considered as different states during training HMM. The probable regulatory genes of target genes can be found out through the resulting states transition matrix and the determinate regulatory functions can be predicted using nonlinear regression algorithm. The experiments on artificial and real-life datasets show the effectiveness of HMM in building GRN.展开更多
为了进一步提高GEP的函数自动建模的效率,提出了朴素基因表达式编程模型(Na ve GEP)NGEP;提出了原子基因片断的概念,以保护进化良好的基因片段;引入了基因嫁接操作,实现了NGEP原型。实验表明:NGEP在函数自动建模的收敛速度是标准GEP的2~...为了进一步提高GEP的函数自动建模的效率,提出了朴素基因表达式编程模型(Na ve GEP)NGEP;提出了原子基因片断的概念,以保护进化良好的基因片段;引入了基因嫁接操作,实现了NGEP原型。实验表明:NGEP在函数自动建模的收敛速度是标准GEP的2~4倍。展开更多
胶质母细胞瘤(glioblastoma,GBM)是最常见的原发性颅内肿瘤,恶性程度极高,患者预后极差。为了识别GBM预后生物标记物,建立预后模型,本研究通过分析癌症基因组图谱计划(The Cancer Genome Atlas,TCGA)数据库中GBM的表达谱数据,筛选出不...胶质母细胞瘤(glioblastoma,GBM)是最常见的原发性颅内肿瘤,恶性程度极高,患者预后极差。为了识别GBM预后生物标记物,建立预后模型,本研究通过分析癌症基因组图谱计划(The Cancer Genome Atlas,TCGA)数据库中GBM的表达谱数据,筛选出不同生存期GBM患者差异基因。利用GISTIC软件和Kaplan-Meier(KM)生存分析方法分析TCGA数据库中的GBM拷贝数变异数据,识别影响生存的扩增基因(survival-associated amplified gene,SAG)。取短生存期组上调基因和SAG两者的交集基因,进行单因素Cox回归和迭代Lasso回归筛选重要候选基因并建立预后模型;计算预后评分,根据预后评分中位数将患者分为高风险组和低风险组。用ROC曲线判断模型的优良,KM生存分析高低风险组预后差异,并用GEO、CGGA和Rembrandt数据库3个外部数据集进行验证。多因素Cox回归分析判断预后评分的预后独立性。结果显示,GBM不同生存期差异分析得到上调基因426个,下调基因65个。短生存期组上调基因与SAG交集得到47个基因。经过筛选,最终确定六基因(EN2、PPBP、LRRC61、SEL1L3、CPA4、DDIT4L)预后模型。TCGA实验组和3个外部验证组模型的ROC曲线下面积均大于0.6,甚至达到0.912。KM分析显示高低风险组的预后都存在差异(P<0.05)。在多因素Cox回归分析中,六基因预后评分是GBM患者预后的独立影响因素(P<0.05)。通过一系列分析,本研究确立了六基因(EN2、PPBP、LRRC61、SEL1L3、CPA4、DDIT4L)的GBM预后模型,模型具有很好的预测能力,可作为预测GBM患者的独立预后标志物。展开更多
针对分类模型在处理基因表达小样本高维度数据集上存在的分类准确性不足、过拟合、计算复杂度大等问题,提出一种改进模型Two Boosting Deep Forest(TBDForest)。在多描部分采用均等式特征利用方法对原始特征进行变换;在分类过程中考虑...针对分类模型在处理基因表达小样本高维度数据集上存在的分类准确性不足、过拟合、计算复杂度大等问题,提出一种改进模型Two Boosting Deep Forest(TBDForest)。在多描部分采用均等式特征利用方法对原始特征进行变换;在分类过程中考虑到模型所集成的每个森林的拟合质量,将上层最重要的部分判别特征输入到下一级联层,在层间改善类分布问题;对原级联层采用子层级联的结构,增加样本训练机会,减少训练开销,避免模型对参数的依赖。通过在五种疾病基因表达小样本数据集上的验证结果表明,改进的模型增强分类算法在小样本数据集的分类性能上达到了更好的分类效果。展开更多
文摘The research hotspot in post-genomic era is from sequence to function. Building genetic regulatory network (GRN) can help to understand the regulatory mechanism between genes and the function of organisms. Probabilistic GRN has been paid more attention recently. This paper discusses the Hidden Markov Model (HMM) approach served as a tool to build GRN. Different genes with similar expression levels are considered as different states during training HMM. The probable regulatory genes of target genes can be found out through the resulting states transition matrix and the determinate regulatory functions can be predicted using nonlinear regression algorithm. The experiments on artificial and real-life datasets show the effectiveness of HMM in building GRN.
基金广东省科技攻关计划项目(the Key Technologies R&D Program of Guangdong(Province) China under Grant No.G03B2040770) +1 种基金广东省自然科学基金(the Natural Science Foundation of Guangdong Province of China under Grant No.B6480598) 湖南省自然科学基金(the Natural Science Foundation of Hunan Province of China under Grant No.05JJ30122)。
文摘针对分类模型在处理基因表达小样本高维度数据集上存在的分类准确性不足、过拟合、计算复杂度大等问题,提出一种改进模型Two Boosting Deep Forest(TBDForest)。在多描部分采用均等式特征利用方法对原始特征进行变换;在分类过程中考虑到模型所集成的每个森林的拟合质量,将上层最重要的部分判别特征输入到下一级联层,在层间改善类分布问题;对原级联层采用子层级联的结构,增加样本训练机会,减少训练开销,避免模型对参数的依赖。通过在五种疾病基因表达小样本数据集上的验证结果表明,改进的模型增强分类算法在小样本数据集的分类性能上达到了更好的分类效果。