摘要
应用紫外(Ultraviolet,UV)光谱技术对水产养殖水质总氮含量进行快速检测。为了消除各种系统误差与偶然误差对模型预测性能造成的影响,将88个水样的总氮浓度实测值数据和光谱吸光度数据作为原始数据,将模型建立分为样本集划分、数据预处理、特征波段提取、模型选择与LV数量选择5个阶段,以求达到最优预测效果,其中前4个阶段分别使用多种方法进行比较。结果证明每个阶段都是必不可少的,只有通过对比其优劣才能找到最适合总氮含量测定的建模过程及方法。首先用浓度梯度(CG)法对原始数据进行相同的样本集划分处理,然后在此基础上分别建立主成分回归(PCR)、逐步回归(SR)和偏最小二乘回归(PLSR)三种模型,选择预测效果最好的PLSR作为本文的预测模型。PLSR的建模效果会在很大程度上受到潜在变量(LVs)数量的影响,通常选取模型预测均方根误差RMSEP值最小时所对应的LV个数为最优LV个数。其次,选用CG法、随机抽样(RS)法、 Kennard Stone(KS)法和SPXY法4种样本集划分算法对样本进行处理,并对所建立的PLSR模型预测效果进行比较,最终选择SPXY算法作为最优样本划分算法。然后在对样本集进行SPXY法划分的基础上,运用多种预处理算法对光谱吸光度数据进行预处理,包括小波变换(WT)、一阶导数法(Der1st)与二阶导数法(Der2nd)三种单一算法和小波变换与两种导数法的组合预处理算法WT-Der1st和WT-Der2nd。然后在预处理的基础上分别使用连续投影变换(SPA)和逐步回归(SR)两种特征波段提取方法,对比可知, SPA特征提取方法比SR的提取效率高且建模效果好。SPA算法既可以大大地简化模型,又可以在一定程度上提升模型的预测精度。基于WT-Der1st-SPA提取的特征波段为218 nm,与总氮特征波段区间相一致,由此说明该方法比较科学。综合上述建立的10个PLSR模型,考虑到预测精度与模型复杂度2个因素,最终选择基于WT-Der1st-SPA建立的PLSR模型作为最优模型,该模型预测决定系数r2为0.996,预测均方根误差RMSEP为0.042 mg·L-1。由此可见,所建立的模型预测效果非常好,可以快速准确地测定水体的总氮含量,为实现光谱技术在水产养殖其他水质监测指标的在线检测以及快速测定提供了经验。
The paper is intended to achieve rapid determination of total nitrogen(TN) concentration by using Ultraviolet(UV) spectroscopy technology, which was one of the most important indicators to measure the pollution degree in aquaculture water. The original dataset used in the paper contains 88 samples data with actual concentration value and spectral absorbance value. It is helpful to select the optimal model through the five stages that include sample set division algorithms, data preprocessing algorithms, feature band extraction algorithms, model selection algorithms and latent values(LVs) selection method. In the first four stages, the comparison results of different methods show that each stage is necessary, and only by comparing the advantages and disadvantages of modeling results with various algorithms can we find the most suitable modeling process and method. First of all, the original sample set is processed by the concentration gradient(CG) method, then three models are built which respectively are principal component regression(PCR), stepwise regression(SR) and partial least squares regression(PLSR), and it proves that the PLSR is the best prediction model. The number of LVs can greatly influence the accuracy of model, and usually when the value of the model root mean square error of prediction(RMSEP) is the minimum, the LV number is optimal. Secondly, it is testified that the SPXY algorithm is the best by comparing the effect of random sampling(RS) algorithm, concentration gradient(CG) algorithm, kennard stone(KS) algorithm and SPXY algorithm. Thirdly, based on SPXY algorithm, the paper uses five preprocessing algorithms which are wavelet transform(WT) method, first derivative(Der1 st), and second derivative(Der2 nd) three single preprocessing algorithms, WT-Der1 st and WT-Der2 nd. Fourthly, according to the results of data processing, using successive projections algorithm(SPA) and stepwise regression(SR) for feature band extraction algorithms, the results show that the extraction efficiency of SPA not only can greatly reduce the complexity of model, but also improve the prediction accuracy. The feature band extracted based on WT-Der1 st-SPA is 218 nm, which is consistent with the characteristics of total nitrogen band range, indicating the method was relatively scientific. Finally, considering the prediction accuracy and complexity of model, the PLSR based on WT-Der1 st-SPA with the best results with the determination coefficient(r2) and RMSEP being 0.996 and 0.042 mg·L-1 for the prediction set in 10 models. In short, the prediction model established could be applied to the rapid and accurate determination of total nitrogen concentration. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.
作者
李鑫星
周婧
唐红
孙龙清
曹霞敏
张小栓
LI Xin-xing;ZHOU Jing;TANG Hong;SUN Long-qing;CAO Xia-min;ZHANG Xiao-shuan(Beijing Laboratory of Food Quality and Safety,College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China;Yantai Institute of China Agricultural University,Yantai 264000,China;School of Biology and Basic Medical Sciences,Soochow University,Suzhou 215200,China;College of Engineering,China Agricultural University,Beijing 100083,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2020年第1期195-201,共7页
Spectroscopy and Spectral Analysis
基金
国家重点研发计划项目(2017YFE0111200)资助
关键词
紫外光谱
总氮
小波变换
连续投影变换
潜在变量
偏最小二乘回归
Ultraviolet spectroscopy
Total nitrogen
Wavelet transform
Successive projections algorithm
Latent values(LVs)
Partial least squares regression(PLSR)