摘要
利用R开源统计分析软件平台,以中药HPLC指纹图谱数据为例,构建多维多息特征数据挖掘模型并进行可视化处理分析,研究结果显示主成分分析降维后的综合主成分能够反映多维多息特征数据的规律,并且验证主成分聚类模型及神经网络模型用于揭示多维多息数据的信息特征的有效性与实用性。最终根据主成分聚类分析结果,建立未知产地川芎样晶的产地预测鉴别模型。
Grounded on the R open source slLatistic environment, this paper builds a data mining model on the complex HPLC multi - dimensional fingerprint data with visualization analysis. The result fully reflects that PCA can be used as a model to reveal the principles of multi - dimensional data, and verifies the validity and practicality of principal component and cluster analysis and neural network to reveal the character of multi - dimensional data. Finally, based on the results of PCA and cluster analysis, this paper builds a training network model through techniques of machine learning and other related statistical algorithm to predict the habitat of unknown TCM sample, which supplies sufficient evidences to the TCM quality control.
出处
《现代图书情报技术》
CSSCI
北大核心
2011年第12期69-73,共5页
New Technology of Library and Information Service
基金
教育部春晖计划基金项目"中药指纹图谱复杂数据的计算机模拟识别技术研究"(项目编号:Z2007-1-61004)的研究成果之一
关键词
多维多息
数据挖掘
主成分聚类
神经网络
Multi - dimensional information Data mining Principal component and cluster analysis Neural network