期刊文献+

基于深度神经网络的SSR分子标记对茶叶产地的溯源研究

Research on SSR Molecular Markers for Traceability of Tea Origins Based on Deep Neural Network
下载PDF
导出
摘要 【目的】对不同品种的茶叶进行区分和产地溯源,同时为其他植物分类提供参考依据。【方法】以简单重复序列标记(Simple sequence repeat,SSR)为基础,运用生物信息学的研究方法,对来自湖南、云南、福建和浙江省的313个茶叶样本的来源属地及10个外类群关系进行研究:首先,筛选出高质量的54个SSR位点,通过主成分分析(Principal compon ent analysis,PCA),构建进化树,分析各省间茶叶样本的差异度;其次,通过比较线性回归模型、随机森林模型和深度神经网络(Deep neural network,DNN)模型的分类准确度,选择准确度最高的神经网络模型进行溯源模型构建及优化。【结果】4个省的茶叶样本个体相对聚集,其中云南省的样本个体较其他省份差异大;福建、浙江、湖南的样本分别聚集,表明福建、浙江、湖南三省间茶叶差异显著,但有少量交叉,具有一定的相似遗传结构特性,亲缘关系较近。利用3种不同的模型对54个SSR分子标记矩阵构建模型,初步鉴定出线性回归模型准确率为81%,随机森林模型准确率为77%,而DNN模型准确率最高、为86%,由此可得出DNN模型对茶叶的分类效果最好。随后利用54个SSR分子标记和323个样本构建预测模型,并对一次训练的样本个数(Batch size)、训练的次数(Step size)、隐藏层层数及每层节点数进行优化,发现这4个参数的优化结果当样本个数为150、训练次数为20000、隐藏层层数为2层时验证集和测试集的准确率最高、约95%,即2层神经网络对茶叶分析效果最佳。【结论】基于深度神经网络的SSR分子标记为茶叶分类、产地溯源研究和茶叶育种等方面提供支持依据,构建的分类模型也可用于其他物种重测序数据的属地来源鉴定。 【Objective】The study was conducted to differentiate and trace the origin of different varieties of tea,and provide a reference basis for the classification of other plants.【Method】The sources genus of 313 tea samples from Hunan,Yunnan,Fujian and Zhejiang Provinces and 10 outgroup relationships were investigated by utilizing SSR-based and bioinformatics research methods.First,54 SSR loci of high quality were screened and the degree of variation among tea samples from different provinces were analyzed by Principal Component Analysis(PCA)and constructing an evolutionary tree.Second,the classification accuracy of three models including the Linear Regression Model,the Random Forest Model,and the Deep Neural Networks Model(DNN)were compared and the Neural Networks Model with the highest accuracy were selected for constructing and optimizing the traceability model.【Result】The sample individuals showed relative aggregation within the four provinces,in which the sample individuals within Yunnan Province differed significantly compared with those in other provinces;while the samples from Fujian,Zhejiang and Hunan showed separated aggregation,indicating that there were significant differences in tea among Fujian,Zhejiang and Hunan Provinces,but there was a small amount of crossover,with some similar genetic structure characteristics,and that the individuals from these three provinces were more closely related.By using three different models to construct a model for the molecular marker matrix of 54 SSR markers,we initially identified that the accuracy of the Linear Regression Model was 81%,that of the Random Forest Model was 77%,and while the accuracy of DNN Model was the highest of 86%.Consequently,it could be inferred that the DNN Model was optimal for classifying tea trees.Subsequently,a prediction model was constructed with 54 SSR markers and 323 samples.The batch size,step size,number of layers in the hidden layer,and number of nodes in each layer of each training were optimized.It was found that the highest accuracy of approximately 95%for validation and test sets was achieved when the batch size was 150,the step size was 20000 and the number of layers in the hidden layer was 2.Therefore,a 2-layer neural network was optimal for the analysis of tea.【Conclusion】DNN-Based SSR molecular markers provide a strong foundation for researches on tea classification,origin traceability,and tea breeding.The constructed classification model can also be used for identifying the origin of resequencing data for other species.
作者 龚浩 张莉莉 陈富荣 林丽霞 陈意君 张乐 孙春莲 孙键 GONG Hao;ZHANG Lili;CHEN Furong;LIN Lixia;CHEN Yijun;ZHANG Le;SUN Chunlian;SUN Jian(School of Life Science,Huizhou University,Huizhou 516007,China;School of Economics and Management,Huizhou University,Huizhou 516007,China)
出处 《广东农业科学》 CAS 2023年第9期108-116,共9页 Guangdong Agricultural Sciences
基金 广东省科技创新战略专项基金(pdjh2023b0500) 惠州学院教授、博士启动项目(2021JB017)。
关键词 茶叶 SSR PCA 深度神经网络 溯源 分子标记 tea SSR PCA deep neural network traceability molecular marker
  • 相关文献

参考文献20

二级参考文献309

共引文献201

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部