Gene selection (feature selection) is generally pertormed in gene space(feature space), where a very serious curse of dimensionality problem always existsbecause the number of genes is much larger than the number of s...Gene selection (feature selection) is generally pertormed in gene space(feature space), where a very serious curse of dimensionality problem always existsbecause the number of genes is much larger than the number of samples in gene space(G-space). This results in difficulty in modeling the data set in this space and the lowconfidence of the result of gene selection. How to find a gene subset in this case is achallenging subject. In this paper, the above G-space is transformed into its dual space,referred to as class space (C-space) such that the number of dimensions is the verynumber of classes of the samples in G-space and the number of samples in C-space isthe number of genes in G-space. it is obvious that the curse of dimensionality in C-spacedoes not exist. A new gene selection method which is based on the principle of separatingdifferent classes as far as possible is presented with the help of Principal ComponentAnalysis (PCA). The experimental results on gene selection for real data set areevaluated with Fisher criterion, weighted Fisher criterion as well as leave-one-out crossvalidation, showing that the method presented here is effective and efficient.展开更多
文摘Gene selection (feature selection) is generally pertormed in gene space(feature space), where a very serious curse of dimensionality problem always existsbecause the number of genes is much larger than the number of samples in gene space(G-space). This results in difficulty in modeling the data set in this space and the lowconfidence of the result of gene selection. How to find a gene subset in this case is achallenging subject. In this paper, the above G-space is transformed into its dual space,referred to as class space (C-space) such that the number of dimensions is the verynumber of classes of the samples in G-space and the number of samples in C-space isthe number of genes in G-space. it is obvious that the curse of dimensionality in C-spacedoes not exist. A new gene selection method which is based on the principle of separatingdifferent classes as far as possible is presented with the help of Principal ComponentAnalysis (PCA). The experimental results on gene selection for real data set areevaluated with Fisher criterion, weighted Fisher criterion as well as leave-one-out crossvalidation, showing that the method presented here is effective and efficient.