期刊文献+

Towards integrated oncogenic marker recognition through mutual information-based statist!cally significant feature extraction: an association rule mining based study on cancer expression and methylation profiles 被引量:5

Towards integrated oncogenic marker recognition through mutual information-based statist!cally significant feature extraction: an association rule mining based study on cancer expression and methylation profiles
原文传递
导出
摘要 Background: Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles. Methods: We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that follows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "mRMR" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch t-test (parametric test) and Shrink t-test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for in silico validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel. Results: The novel markers of AML are {ABCB11↑ U KRT17↓} (i.e., ABCBll as up-regulated, & KRT17 as down- regulated), and {AP1SI-UKRT17↓ U NEIL2-UDYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1 ||U APBA2 U C4orf31: (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper- methylated). Conclusion: The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease. Background: Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles. Methods: We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that follows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "mRMR" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch t-test (parametric test) and Shrink t-test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for in silico validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel. Results: The novel markers of AML are {ABCB11↑ U KRT17↓} (i.e., ABCBll as up-regulated, & KRT17 as down- regulated), and {AP1SI-UKRT17↓ U NEIL2-UDYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1 ||U APBA2 U C4orf31: (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper- methylated). Conclusion: The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.
出处 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2017年第4期302-327,共26页 中国电气与电子工程前沿(英文版)
关键词 integrated markers feature extraction statistical test rule mining integrated markers feature extraction statistical test rule mining
  • 相关文献

同被引文献19

引证文献5

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部