摘要
中国幅员辽阔,由于各地自然条件不同,加之植物种类繁多,森林植物和森林类型极为丰富多样,而树叶的准确识别对树木研究具有重要意义。树叶分类是一项具有挑战性的任务,需要对树叶的形态、纹理、颜色等多种特征进行识别和分类。提出一种基于Cross Vision Transformer(CrossViT)的树叶分类与识别方法。该方法以八角金盘(Fatsia japonica)、杜鹃(Rhododendron simsii)、广玉兰(Magnolia grandiflora)、桂树(Cinnamomum cassia)、海桐(Pittosporum tobira)、木槿(Hibiscus syriacus)、石楠(Photinia serratifolia)、梧桐(Firmiana simplex)、银杏(Ginkgo biloba)和樟树(Camphora officinarum)10种园林绿化常见阔叶树叶片为实验对象。首先,分别拍摄在实验环境和真实环境下的树叶图像作为数据集;其次,对CrossViT模型的网络结构,构造两个独立的分支,以获取不同大小的嵌入向量,通过优化Transformer编码器,利用交叉注意力模块融合不同大小的嵌入向量,以平衡计算成本和识别精度;最后,通过一个MLP Head得到最终的分类结果。对两个不同环境下的树叶数据集的训练和测试表明,该研究基于的CrossViT模型在实验环境下的树叶数据集上总体准确率约92.5%,在真实环境下的树叶数据集上总体准确率约75.2%。通过与传统卷积网络的比较,所提出方法的性能在实验环境下的树叶数据集上高出0.6~4.0个百分点,在真实环境下的树叶数据集上高出1.3~3.3个百分点,FLOPs和模型参数略有增加。
China,characterized by its expansive territory and diverse ecological conditions,hosts a rich tapestry of forest flora,showcasing extensive botanical diversity.Accurate leaf recognition is a pivotal component in botanical research,requiring meticulous identification and classification of intricate leaf attributes such as shape,texture,and color.This study introduced an innovative leaf classification and recognition methodology based on the Cross Vision Transformer(CrossViT).The research focused on ten distinct types of leaves:Fatsia japonica,Rhododendron simsii,Magnolia grandiflora,Cinnamomum cassia,Pittosporum tobira,Hibiscus syriacus,Photinia serratifolia,Firmiana simplex,Ginkgo biloba,and Camphora officinarum.Comprehensive datasets were curated by capturing leaf images under controlled experimental conditions and in diverse real-world environments.This meticulous approach ensured the robustness of the dataset used for training and validation of the CrossViT model.Central to the methodology is the enhancement of the CrossViT model's architecture.Dual independent branches were incorporated to generate embedding vectors of varying dimensions,effectively capturing a wide range of leaf image features.The Transformer encoder was further optimized through the integration of a cross-attention mechanism,facilitating the seamless fusion of embedding vectors across different scales.This strategic refinement aimed to strike a balance between computational efficiency and classification accuracy,enhancing the model's performance in high-precision leaf categorization tasks.The classification process utilized a Multilayer Perceptron(MLP) Head,which successfully yielded robust results.Evaluation across distinct environmental settings revealed significant achievements,with an overall accuracy of approximately 92.5% in the controlled experimental dataset and 75.2% in the real-world dataset.The comparative analysis with traditional convolutional neural networks(CNNs) highlighted notable performance advantages of the CrossViT-based approach.In the controlled experimental environment,performance improvements ranged from 0.6 to 4.0 percentage points,while in the real-world scenario,improvements ranged from 1.3 to 3.3 percentage points.Despite a modest increase in floating-point operations(FLOPs) and model parameters,the CrossViT model demonstrated substantial gains in accuracy,underscoring its efficacy in leaf classification and recognition tasks.In conclusion,the proposed CrossViT-based methodology represents an efficient and effective approach to advance tree research and ecological conservation.By leveraging advanced deep learning techniques,this study contributes significantly to the disciplines of botany and environmental science,addressing critical challenges in biodiversity monitoring and sustainable natural resource management.The findings hold promise for enhancing our understanding and preservation of global forest ecosystems,emphasizing the importance of technological innovation in fostering environmental stewardship and conservation efforts worldwide.
作者
许兵博
张怀清
薛联凤
云挺
XU Bingbo;ZHANG Huaiqing;XUE Lianfeng;YUN Ting(College of Information Science and Technology,Nanjing Forestry University,Nanjing 210037,China;Chinese Academy of Forestry,Beijing 100091,China;College of Forestry and Grassland,Nanjing Forestry University,Nanjing 210037,China)
出处
《林业工程学报》
CSCD
北大核心
2024年第6期161-172,共12页
Journal of Forestry Engineering
基金
国家自然科学基金(31770591,32071681)
江苏省自然科学基金面上项目(BK20221337)
江苏省农业自主创新项目(CX(22)3048)
自然资源部国土卫星遥感应用重点实验室开放基金(KLSMNR-G202208)。
关键词
树种识别
CrossViT模型
自注意力机制
可视化
树木表型分析
tree species identification
cross vision transformer(CrossViT model)
self-attention
visualization
plant phenotype analysis