期刊文献+

High Dimension Multivariate Data Analysis for Small Group Samples of Chemical Volatile Profiles of African Nightshade Species

High Dimension Multivariate Data Analysis for Small Group Samples of Chemical Volatile Profiles of African Nightshade Species
下载PDF
导出
摘要 Quantitative headspace analysis of volatiles emitted by plants or any other living organisms in chemical ecology studies generates large multidimensional data that require extensive mining and refining to extract useful information. More often the number of variables and the quantified volatile compounds exceed the number of observations or samples and hence many traditional statistical analysis methods become inefficient. Here, we employed machine learning algorithm, random forest (RF) in combination with distance-based procedure, similarity percentage (SIMPER) as preprocessing steps to reduce the data dimensionality in the chemical profiles of volatiles from three African nightshade plant species before subjecting the data to non-metric multidimensional scaling (NMDS). In addition, non-parametric methods namely permutational multivariate analysis of variance (PERMANOVA) and analysis of similarities (ANOSIM) were applied to test hypothesis of differences among the African nightshade species based on the volatiles profiles and ascertain the patterns revealed by NMDS plots. Our results revealed that there were significant differences among the African nightshade species when the data’s dimension was reduced using RF variable importance and SIMPER, as also supported by NMDS plots that showed S. scabrum being separated from S. villosum and S. sarrachoides based on the reduced data variables. The novelty of our work is on the merits of using data reduction techniques to successfully reveal differences in groups which could have otherwise not been the case if the analysis were performed on the entire original data matrix characterized by small samples. The R code used in the analysis has been shared herein for interested researchers to customise it for their own data of similar nature. Quantitative headspace analysis of volatiles emitted by plants or any other living organisms in chemical ecology studies generates large multidimensional data that require extensive mining and refining to extract useful information. More often the number of variables and the quantified volatile compounds exceed the number of observations or samples and hence many traditional statistical analysis methods become inefficient. Here, we employed machine learning algorithm, random forest (RF) in combination with distance-based procedure, similarity percentage (SIMPER) as preprocessing steps to reduce the data dimensionality in the chemical profiles of volatiles from three African nightshade plant species before subjecting the data to non-metric multidimensional scaling (NMDS). In addition, non-parametric methods namely permutational multivariate analysis of variance (PERMANOVA) and analysis of similarities (ANOSIM) were applied to test hypothesis of differences among the African nightshade species based on the volatiles profiles and ascertain the patterns revealed by NMDS plots. Our results revealed that there were significant differences among the African nightshade species when the data’s dimension was reduced using RF variable importance and SIMPER, as also supported by NMDS plots that showed S. scabrum being separated from S. villosum and S. sarrachoides based on the reduced data variables. The novelty of our work is on the merits of using data reduction techniques to successfully reveal differences in groups which could have otherwise not been the case if the analysis were performed on the entire original data matrix characterized by small samples. The R code used in the analysis has been shared herein for interested researchers to customise it for their own data of similar nature.
作者 Lorna Chepkemoi Daisy Salifu Lucy Kananu Murungi Henri E. Z. Tonnang Lorna Chepkemoi;Daisy Salifu;Lucy Kananu Murungi;Henri E. Z. Tonnang(International Centre of Insect Physiology and Ecology (ICIPE), Nairobi, Kenya;Department of Horticulture and Food Security, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya)
出处 《Journal of Data Analysis and Information Processing》 2024年第2期210-231,共22页 数据分析和信息处理(英文)
关键词 Random Forest Similarity Percentage PERMANOVA ANOSIM Non-Metric Multi-Dimensional Scaling Random Forest Similarity Percentage PERMANOVA ANOSIM Non-Metric Multi-Dimensional Scaling
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部