摘要
运用统计学及数据挖掘相关知识,以结肠癌基因表达图谱为研究对象,综合运用GB指数、BP神经网络、小波变换等方法对问题给出求解的过程和结果.首先采用GB综合指数对无关基因进行筛选,选择两组备用基因的交集(114个)作为信息基因,降低基因维度.其次,用基因间的强相关性剔除冗余基因,利用BP神经网络对基因进行错判数计算,选取错判率最低、基因子集中基因数量最少的基因特征组,再利用平均影响值(MIV)方法进行基因筛选,最后进行错判数计算,最终确定含有12个基因的子集为最优基因组合.第三步,将每组基因表达值看做基因信号,运用小波转换法对基因数据进行去噪,去噪后特征基因减少为8个.
This paper conducts research on colon cancer gene expression profiling by applying statistical and data mining methods The solution procedure, along with the results, are deployed through the tools of GB index, BP neutral network, wavelet transform. The irrelevant genes are firstly screened out by GB composite index selecting the intersection of gene from two groups of spare genes as info-genes aiming to reducing dimensions. This paper takes advantages of the strong relationship among genes to remove the redundant ones. BP neutral network are used to evaluate the number of genes that are failed to judge. Thus the MIV method could be adopted to search for the signature genome from those genes whose misjudgment rate is minimum or the number of genes from their subset is minimum. As a result, the subset that consists of 12 genes is determined to be the optimal genome. Finally, wavelet transform is used to reduce the signal noise of data considering each gene expression as a gene signal. The signature genome is decreased to be 8 after reducing the signal noise.
出处
《数学的实践与认识》
CSCD
北大核心
2011年第14期47-58,共12页
Mathematics in Practice and Theory