摘要
为明确昆虫基因组组装大小产生偏差的原因,利用流式细胞术估测来自6目10科的21种常见农业昆虫的基因组大小,同时从动物基因组大小数据库收集和整理1345个经流式细胞术估测的昆虫基因组大小信息,并从NCBI、GigaDB、DDBJ、i5k workspace@NAL、InsectBase和VectorBase等14个物种遗传信息数据网站获取536种昆虫的基因组组装信息进行比较分析。结果表明,收集的昆虫中有202种同时具有流式细胞术估测的基因组大小和基因组组装大小的信息,以更接近真实值的流式细胞术估测基因组大小为参照,比较发现其中42种昆虫的基因组组装大小偏大,98种昆虫的基因组组装大小偏小,而62种昆虫的基因组组装大小和经流式细胞术估测大小相似。基因组组装大小比经流式细胞术估测大小更大的物种,通过Wilcoxon秩和检验发现显著具有更多的重复序列,但与GC含量、contig N50及基因组测序和组装策略并无显著相关性。综合分析认为,在大多数情况下昆虫基因组组装大小更小,表明组装并不完整,但在重复序列占比较高的情况下,昆虫基因组的组装出现了冗余,导致组装大小更大。
To explore the causes for the discrepancy between assembled genome sizes and estimated genome sizes by flow cytometry of insects,the datasets of insect genome sizes generated by these two methods were compared.The genome sizes of 21 agricultural insects from ten families of six orders were first estimated by flow cytometry,and genome size information of 1345 insects were then collected from the Animal Genome Size Database.Meanwhile,the information of 536 insect species’annotated genome assemblies was also collected from 14 databases(NCBI,GigaDB,DDBJ,DNA Zoo,NGDC,DRYAD,BIPAA,i5k workspace@NAL,InsectBase,VectorBase,LepBase,FireflyBase,SilkDB and Trichoplusia ni Genome Database).There were 202 insect species with both kinds of genome size values.According to the technical principles,the genome sizes estimated by flow cytometry were deduced to be close to the true values.Compared with values by flow cytometry,the assembled genome sizes of 98 insects were smaller,whereas those of 42 insects were larger and those of 62 were similar.Furthermore,the insects with larger assembled genome sizes tended to possess abundant repetitive sequences according to Wilcoxon rank-sum test;however,no big difference was observed between the groups with larger and smaller genome sizes in terms of GC content,contig N50,sequencing platforms and assembly software.In summary,insect assembled genome sizes tended to be smaller because of incomplete assembly;however,in the case of high percentage of complex repetitive sequences existing in the genome,the assembled genome size could be larger because of assembly redundancy.
作者
丛宇阳
贺康
舒润国
程梓淇
陈昊
王亚琴
李飞
Cong Yuyang;He Kang;Shu Runguo;Cheng Ziqi;Chen Hao;Wang Yaqin;Li Fei(Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province,Institute of Insect Sciences,Zhejiang University,Hangzhou 310058,Zhejiang Province,China;State Key Laboratory of Rice Biology,Institute of Biotechnology,Zhejiang University,Hangzhou 310058,Zhejiang Province,China)
出处
《植物保护学报》
CAS
CSCD
北大核心
2021年第6期1217-1225,共9页
Journal of Plant Protection
基金
国家自然科学基金(31972354)。