摘要
目的转录组测序技术为研究特定组织细胞生理状态和分子水平变化提供有力方法。为了建立分子水平变化和组织细胞生理功能之间的关系并排除随机因素的干扰,需要建立基于系列RNA-seq数据的表达模式分析方法。方法本文提出了一种整合的方法(geneexpressionclustermethod,GECluster)用于对系列样本模式聚类。整合曲线拟合以及信息熵建立模型并提取特征属性,最后按照上面模型提供的特征属性对基因进行层次聚类分析。结果表达趋势一致的基因被很好地聚到一个类别中,功能富集分析发现这些基因具有很强的功能相关性,并与文献报道相吻合。结论GECluster可以更灵活客观对多样本系列RNA—seq数据挖掘共表达基因,为后期功能分析提供了更有效的研究方案。
Objective Transcriptome sequencingplays an increasingly important role in biologi- cal science, which provides the most direct evidence of the relationship between physiological state and molecular changes. This study examined a new method that could be used to analyze the biologi- cal information based on serial analysis of gene expression (SAGE) . Methods ~ (GEClus- ter) was employed to examine a serial of samples by means of cluster analysis. The model was estab- lished for identification of gene features by using polynomial fitting and Shannon' s entropy. Hierar- chical clustering algorithm was performed according to the gene features. Then functional enrichment analysis (FEA) was used for specific category obtained by GECluster. Results The genes that have similar expression pattern were found to fall in the same category. FEA showed these genes had strong associations in terms of their function, which was in line with the reports by some litera- tures. Conclusion GECluster provides an objective and flexible method for identification of co-ex- pressed gene based on RNA-seq data.
出处
《医学分子生物学杂志》
CAS
2013年第1期41-46,共6页
Journal of Medical Molecular Biology
基金
国家自然科学基金(N0.61170154,91129710),高等学校博士学科点专项科研基金(No.20102307120027,20102307110022)
关键词
转录组测序
曲线拟合
信息熵
聚类分析
RNA-seq
polynomial fitting
Shannon' s entropy
hierarchical clustering