eDNA监测测序数据分析注释中参考数据库选择、指标阈值选择、目标数据准备的影响——以长江中游鱼类为监测目标

The impacts of reference database selection,indicator threshold determination and target data preparation on the sequence data analysis of eDNA monitoring Taking fish as the target in the middle Yangtze River

下载PDF

导出

摘要在基于宏条形码(meta-barcoding)的eDNA监测技术中,eDNA测序数据的分析和注释是决定监测结果判断和评估精准与否的基础,而参考数据库选择、指标阈值选择、目标数据准备是eDNA测序数据分析和注释中最为关键的3个技术环节。为厘清上述3个技术环节处理方案的影响,本研究以长江中游2组eDNA监测COI基因测序数据为分析对象,针对鱼类的检出进行3组实验来分别检验:1)不同参考数据库及物种注释算法对注释结果的影响;2)不同OTU聚类序列相似度和物种注释分类置信度(序列一致性和序列覆盖度)对注释结果的影响;3)目标数据中各物种不同序列丰富度对注释结果的影响。结果显示:1)Blast算法下,3个版本nt库注释出的物种基本一致(72%~78%),2个本地序列参考库注释出的物种也基本一致(91%~96%),这5个序列参考库注释出的物种52%~68%一致;nt库RDP Classifier算法注释出的物种覆盖95%以上Blast算法注释出的物种,并比Blast算法注释出的物种多151%~443%,多出的物种大都是错误注释,本地参考数据库RDP Classifier算法注释出的物种覆盖66%~85%的Blast算法注释出的物种,并存在数条只注释到科属的结果。2)OTU聚类序列相似度阈值,取值0.999比取值0.99获得的OTU多154%~209%,注释到鱼类的OTU多240%~490%;注释分类置信度阈值(Blast算法,序列一致性和序列覆盖度)从0.8到0.99注释获得的物种组成(94%以上)基本一致,OTU组成(83%以上)也基本一致,注释分类置信度阈值取0.7时注释获得的物种组成、OTU组成与取0.8及以上时注释获得的有较大差异。3)在OTU聚类序列相似度阈值为0.999、注释分类置信度阈值为0.9时,多序列数据注释所得鱼类物种数、OTU数最多,物种注释正确率最高(达81.49%),分别比单序列数据的多7%、215%和高5%。在具体eDNA测序数据的分析和注释中,可通过建立完善本地参考数据库、优化OTU聚类序列相似度和物种注释分类置信度(序列一致性和序列覆盖度)取值、增加目标数据的丰富度来提高注释结果的准确性,但受制于物种注释算法的局限性,物种注释错误和注释遗漏的问题可能将长期存在,物种注释正确率通常低于85%(基于COI基因的eDNA监测)。 In the meta-barcoding based eDNA monitoring technology,the analysis and annotation of eDNA sequence data serve as the foundation for obtaining accurate and reliable monitoring results.The selection of reference databases,the determination of analysis and annotation indicator thresholds,and the preparation of target data are the most critical technical steps in eDNA sequence data analysis and annotation.To clarify the impacts of these three technical aspects and provide scientific support for the standardization of eDNA monitoring technology,the current study used two sets of COI gene sequence data from eDNA monitoring in the middle reach of the Yangtze River as the analysis objects and designed three sets of experiments to test 1)the impacts of different reference databases and species annotation algorithms on the annotation results,2)the impacts of different OTU clustering sequence similarity and species annotation classification confidence(sequence consistency and sequence coverage)on the annotation results,and 3)the impacts of different target sequence data richness of each species on the annotation results.The results showed that:1)under the Blast algorithm,the annotated species matched with three versions of nt library from NCBI were generally consistent(72%-78%);those matched with two local sequence reference libraries were also generally consistent(91%-96%);and the annotated species from the five results matched with these five sequence reference libraries were consistent in 52%-68%.The RDP Classifier algorithm annotated species matched with nt libraries covered over 95%of Blast algorithm annotated species,and increased by 151%-443%species,but most additional species were misannotated.The RDP Classifier algorithm annotated species matched with local sequence reference libraries covered 66%-85%of Blast algorithm annotated species,and there were several results only annotated to family or genus level.2)When the OTU clustering sequence similarity threshold was set to 0.999,it obtained 154%-209%more OTUs than when set to 0.99,and 240%-490%more annotated OTUs of fish were obtained.The classification confidence threshold(Blast algorithm)had little effect on species composition when changed from 0.8 to 0.99,with over 94%consistency,but there was a significant difference when it was set to 0.7.3)When the OTU clustering sequence similarity threshold was 0.999 and the classification confidence threshold was 0.9,the number of fish species and OTUs obtained from multiple-sequences data annotation was the largest.It also had the highest species annotation accuracy(81.49%),which increased by 7%fish species,215%OTUs and 5%accuracy respectively compared to single-sequence data annotation.In eDNA sequenc data analysis and annotation,accuracy can be improved by establishing and improving local reference databases,optimizing OTU clustering sequence similarity and species annotation classification confidence thresholds(sequence consistency and sequence coverage),increasing target sequence data richness.However,due to the limitation of species annotation algorithms,problems such as species annotation errors and omissions may persist in eDNA sequence data analysis and annotation in the future.Then,the species annotation accuracy of eDNA monitoring(based on the COI gene)would always be lower than 85%.

作者许兰馨杨海乐刘志刚杜浩 Xu Lanxin;Yang Haile;Liu Zhigang;Du Hao(Key Laboratory of Freshwater Biodiversity Conservation,Ministry of Agriculture and Rural Affairs,Yangtze River Fisheries Research Institute,Chinese Academy of Fishery Sciences,Wuhan 430223,P.R.China;Wuxi Fisheries College,Nanjing Agricultural University,Wuxi 214000,P.R.China)

机构地区中国水产科学研究院长江水产研究所南京农业大学无锡渔业学院

出处《湖泊科学》 EI CAS CSCD 北大核心 2024年第6期1843-1852,共10页 Journal of Lake Sciences

基金中央级公益性科研院所基本科研业务费专项(YFI202201) 农业财政专项“长江禁捕后常态化监测专项”(CJJC-2023-01)联合资助。

关键词环境DNA 鱼类宏条形码参考数据库 OTU聚类序列相似度物种注释分类置信度长江中游 Environmental DNA fish meta-barcoding reference database OTU clustering sequence similarity species annotation classification confidence middle Yangtze River

分类号 G63 [文化科学—教育学]

引文网络
相关文献

1许子潇.新见曾侯剑研究——兼论曾侯昃的年代[J].形象史学,2024(1):67-77.
2叶平,管晓,张凤,陈艳,刘明冲.卧龙自然保护区独叶草林中资源调查初探[J].林业建设,2024,42(4):28-32.
3王慧敏,王智强,郭婷,梁吉业.一种基于主动学习的开放集图像识别方法[J].小型微型计算机系统,2024,45(10):2442-2448.
4姜特,张祥,李来来,柴艺汇,蒲翔,何兰兰,卢礼平,张丽艳.基于网络药理学与分子对接探索头花蓼抗肺炎的作用机制[J].贵州科学,2024,42(5):23-28.
5王篤堃.历代《尚书》科举范本变迁及其成因新探——兼论科举时代经学范本的文本困境[J].合肥师范学院学报,2023,41(4):22-26.
6王子豪,潘涛,张振军,王俊平.青岛首次诱集一种小蠹——云杉小根小蠹[J].西部林业科学,2024,53(5):80-83.
7李衍志,郭丽敏,张维国,古健,宗井彬,张凯,刘君.高压釜泄漏声音的高频高阶空间交互识别算法研究[J].计算机测量与控制,2024,32(10):169-174.
8师雪玮,徐大林,刘志成,徐志彦.基于改进小波阈值的声波信号去噪算法与仿真[J].指挥控制与仿真,2024,46(5):69-76.
9自然资源部办公厅印发《关于开展2024年度全国国士变更调查工作的通知》[J].城市规划通讯,2024(19):11-11.
10中国电科院牵头编制的一项IEC国际标准发布[J].工业控制计算机,2024,37(10):69-69.

湖泊科学

2024年第6期

浏览历史

内容加载中请稍等...

eDNA监测测序数据分析注释中参考数据库选择、指标阈值选择、目标数据准备的影响——以长江中游鱼类为监测目标

相关作者

相关机构

相关主题

浏览历史