摘要
数据归一化是预处理中的重要组成部分,本文采用正常大鼠与I型糖尿病模型大鼠的尿液核磁共振谱图作为测试数据,研究了线归一、面归一和模归一3种数据预处理方法对代谢组学数据PCA分析结果的影响.分析结果表明,面归一预处理方法能够更好地在PC得分图上将对照组和糖尿病组的样本分开.此外,为了有选择性的去除代谢组学数据组中的噪声变量,本文引入新的参数R来评估PC得分图的分类效果,并把它引入适应度函数,设计相应的遗传算法,对代谢组学数据进行变量选择.经过变量选择后,主成分得分图上不同类别样本的可分性提高了,而且变量数大大地减少,更有利于特征代谢物的标记与识别.
Normalization is one of the most important steps of metabonomic data preprocessing. In this study,on one hand, we compared the effects of three kinds of normalization methods to the pattern recognition results in the data preprocessing,on the other hand,we evaluated evolutionary variable selection methods in improving the quality of the data clustering. Three kinds of normalization methods,i, e. Inf-Norm,l-Norm and 2-Norm, were tested on the metabonomics data sets composed of normal and diabetes I rats ' urine NMR spectra data. They were found to greatly affect the outcome of the data analysis. 1-Norm method performed better than the other two methods. Besides, parameter R was defined to evaluate the quality of PC scoring plot, and introduced into the fitness function of genetic algorithm (GA). The use of GA for variable selection was found to improve the data clustering quality. After GA, parrs of the variables were discarding that was better to identify and recognize characteristic metabolites.
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2007年第6期783-787,共5页
Journal of Xiamen University:Natural Science
基金
卫生部卫生教育联合攻关计划项目(wkj2005-2-019)
福建省自然科学基金(2007J0209)
厦门市重大疾病攻关研究基金(3502Z20051027)资助
关键词
代谢组学
预处理
归一化
变量选择
metabonomic
preprocessing
normalization
variable selection