Statistical classification methods are frequently applied to analyze metabolomics data, especially from medicinal plants. Combined with variable selection techniques, we are able to identify marker candidates, which c...Statistical classification methods are frequently applied to analyze metabolomics data, especially from medicinal plants. Combined with variable selection techniques, we are able to identify marker candidates, which can be used to discriminate the group to which unknown subjects belong. After preprocessing, such as outlier checking, normalization, missing value imputation and transformation, we then mainly utilized four novel classification methods: RF (random forest), NSC (nearest shrunken centroid), PLS-DA (partial least square discriminant analysis) and SAM (significant analysis ofmicroarrays). Each method has its own device to measure the importance of single metabolite, so that, it is probable to choose highly ranked metabolites, which show the best prediction accuracy. Adapting above strategy, we have successfully analyzed several kinds of metabolomics data including Panax ginseng, Lespedeza species, Anemarrhean asphodeloides and Gastrodia elata.展开更多
文摘Statistical classification methods are frequently applied to analyze metabolomics data, especially from medicinal plants. Combined with variable selection techniques, we are able to identify marker candidates, which can be used to discriminate the group to which unknown subjects belong. After preprocessing, such as outlier checking, normalization, missing value imputation and transformation, we then mainly utilized four novel classification methods: RF (random forest), NSC (nearest shrunken centroid), PLS-DA (partial least square discriminant analysis) and SAM (significant analysis ofmicroarrays). Each method has its own device to measure the importance of single metabolite, so that, it is probable to choose highly ranked metabolites, which show the best prediction accuracy. Adapting above strategy, we have successfully analyzed several kinds of metabolomics data including Panax ginseng, Lespedeza species, Anemarrhean asphodeloides and Gastrodia elata.