摘要
旨在实现对海洋牧场水下底栖动物的原位识别,使用随机森林算法实现识别分类检测,对目标生物进行分类识别分析,深入挖掘数据,提高工作效率和决策可靠性。利用研发的水下高光谱成像分析仪,在不同的水下环境中通过获取五种海洋牧场常见经济动物(虾夷扇贝、栉孔扇贝、脉红螺、皱纹盘鲍、仿刺参)的高光谱数据,归一化处理后运用机器学习中的随机森林(RF)、基于主成分分析的随机森林(PCA-RF)、基于递归特征消除的随机森林(RFE-RF)三种随机森林算法对五种底栖动物进行分类识别以及对比分析。通过RF的变量重要性排序,筛选出排名较高,对模型贡献度高的波段所对应的反射谱强度数据,再将排名靠前的特征波段数据输入分类器中,通过优化参数,得到分类准确度。将数据的分类结果输出混淆矩阵,可以看到五种样品的识别情况。脉红螺样品识别精度最低,为64%;仿刺参与栉孔扇贝的识别精度最高,达到了100%;虾夷扇贝与皱纹盘鲍的识别精度分别为91%与96%。三种方法最终得到的分类精度分别为:RF 90.13%;PCA-RF 95.20%;RFE-RF 98.74%,达到了较为理想的分类效果,体现了随机森林算法运用在水下高光谱数据分类研究的可行性。
This study aims to identify underwater benthic animals in situ,use random forest algorithm to achieve recognition classification detection,classify and identify target organisms for analysis,dig deeper into the data,and improve efficiency and reliability of decision making.The hyperspectral data of five common economic animals(scallop,ctenophore,veined red snail,wrinkled disc abalone,and imitation spiny ginseng)in different underwater environments were acquired,normalized and processed using random forest(Random Forest,RF)in machine learning,random forest based on principal component analysis method(Principal Component Analysis-Random Forest,PCA-RF),and random forest based on recursive feature elimination method(Recursive feature elimination-Random Forest,RFE-RF).Three random forest algorithms were used to classify five benthic species and for comparative analysis.By ranking the importance of the variables of RF,the reflection spectrum intensity data corresponding to the bands with higher ranking and higher contribution to the model were filtered.Then the top-ranked feature band data were input into the classifier,and the classification accuracy was obtained by optimizing the parameters.The classification results of the data were output to the confusion matrix,and the identification of the five samples could be seen.The lowest recognition accuracy of 64%was obtained for the veined red snail sample;the highest recognition accuracy of 100%was obtained for imitation spiny ginseng and ctenophore scallops;the recognition accuracies of 91%and 96%were obtained for the scallop and wrinkled disc abalone,respectively.The final classification accuracies of the three methods were 90.13%for RF,95.20%for PCA-RF,and 98.74%for RFE-RF,which showed the feasibility of using the random forest algorithm in the classification of underwater hyperspectral data.
作者
董建江
田野
张建兴
栾振东
杜增丰
DONG Jian-jiang;TIAN Ye;ZHANG Jian-xing;LUAN Zhen-dong;DU Zeng-feng(College of Physics and Optoelectronic Engineering,Ocean University of China,Qingdao 266100,China;Key Laboratory of Marine Geology and Environment&Center of Deep Sea Research,Institute of Oceanology,Center for Ocean Mega-Science,Chinese Academy of Sciences,Qingdao 266071,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2023年第10期3015-3022,共8页
Spectroscopy and Spectral Analysis
基金
国家重点研发计划项目(2019YFD0900802)
中国科学院海洋大科学研究中心重点部署项目(COMS2019J06)资助。
关键词
随机森林
高光谱成像
分类
原位识别
底栖动物
特征选择
Random Forest
Hyperspectral imaging
Classification
In situ identification
Benthic fauna
Feature selection