摘要
目的建立权重概率主成分分析模型,通过模拟实验进行模型评价,选择最优模型进行代谢组学数据分析,为代谢组学数据分析提供降噪优化的分析方法。方法使用折刀抽样法计算变量载荷的置信区间和变异系数,利用变量载荷的变异信息设计倒数式、开根式、对数式三种加权方式进行原始数据中的变量加权,结合概率主成分分析模型建立权重概率主成分分析模型;通过模拟实验从第一主成分载荷的估计和预测效能进行模型评价,选择最优权重概率主成分分析模型;绘制代谢组学数据主成分得分图,利用中心距离比较权重概率主成分分析模型与概率主成分分析模型在可视化分组效果。结果倒数式加权概率模型在第一主成分载荷的估计和模型预测方面优于另外两种权重概率模型。在可视化方面,权重概率主成分分析不仅缩小了模型估计的不确定性,而且增大组间的中心距离。结论构建了权重概率主成分分析模型,不仅结果解释和可视化优于概率主成分分析模型,而且为差异变量的筛选提供了一个较小的参考范围。
Objective This paper aims to establish a weighted probabilistic principal component analysis model,assessthe models by simulation experiments to select the best model,and to provide a optimized noise reduction approach for metabolomicsdata analysis.Methods The jackknife technique is employed to construct the confidence interval and the coefficients of variationof the variables loadings.The reciprocal expression,radical expression and logarithmic expression are created using these variationinformation for weighting the variables in the data,further the probabilistic principal component analysis is combined to establish aweighted probabilistic principal component analysis model.We assess the model by simulation experiments from the aspects of thefirst loading vector estimation and prediction power,selecting the best model.We draw the score plots of models and compare thecluster effects in visualization between the weighted probabilistic principal component analysis model and the probabilistic principalcomponent analysis model by centre distances calculated from score plots.Results The reciprocal expression model is better thanthe other two models in the first loading vector estamination and prediction power.The weighted probabilistic principal component analysis model not only decrease the uncertainty associated with the resulting model output but also increase the distance betweengroups in visualization.Conclusion The weighted probabilistic principal component analysis model is better than the probabilisticprincipal component analysis model in the result interpretation and visualization,offering a limited local for difference variablescreening.
作者
高兵
孙琳
谢彪
王文佶
曲思杨
刘美娜
张秋菊
Gao Bing;Sun Lin;Xie Biao(Department of Medical Statistics,Harbin Medical University(150081),Harbin)
出处
《中国卫生统计》
CSCD
北大核心
2018年第6期802-805,共4页
Chinese Journal of Health Statistics
基金
国家自然基金(81502889)
黑龙江省自然基金重点项目(ZD201314)
黑龙江省卫生厅项目(2012-770)
关键词
代谢组学
变量筛选
概率主成分分析
权重
Metabolomics
Variables selection
Probabilistic principal component analysis
Weight