摘要
随着元基因组数据的不断增多,建立一个包含高品质的元基因组样本(也称为"微生物群落")数据的集成化的分析平台成为可能,使得微生物群落样本能够被有效分析、比较与搜索,从中发现更加深入的生物学意义。然而,一方面目前大部分元基因组数据库仅仅提供了简单的数据存储,缺乏良好的样本注释或者仅仅提供了很少的分析功能。另一方面,用于计算微生物群落数据相似性的方法所能够接受的样本数据量非常有限。长期以来,科学家们一直在寻找有效的方法计算海量微生物群落之间的相似性,从而研究样本之间的相似度并发现元基因组数据信息的相关性。Meta-Mesh是一个全新的在线元基因组分析系统,它包括元基因组数据库和分析平台,可以对元基因组样本进行系统、有效地分析,并实现样本的群落结构比较和精确搜索。其中,元基因组数据库已经从公共领域和内部实验室收集了超过7 000个高品质、带有有效注释的样本。同时,Meta-Mesh的分析平台提供了多种在线分析工具,可以对元基因组样本进行群落的结构分析与注释,多角度比较,并能通过快速索引策略和群落结构相似性算法在数据库中高效搜索近似的样本。Meta-Mesh通过"人体微生物群落样本的数据库搜索识别"以及"基于相似度矩阵的样本的聚类"等一系列的元基因组研究案例证明了其分析方面的性能。作为一个在线的元基因组数据库和分析系统,Meta-Mesh将服务于元基因组样本的快速分析、识别、比对、搜索等相关领域。
With the current accumulation of metagenome data, it is possible to build an integrated platform for processing of rigorously selected metagenomic samples (also referred as "metagenomic communities" here) of interests. Any metagenomic samples could then be searched against this database to find the most similar sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories but not well annotated database, and only offer few functions for analysis. On the other hand, the few available methods to measure the similarity of metagenomic data could only compare a few pre-defined set of metagenome. It has long been intriguing scientists to effectively calculate similarities between microbial communities in a large repository, to examine how similar these samples are and to find the correlation of the meta-information of these samples. In this work we propose a novel system, Meta-Mesh, which includes a metagenomic database and its companion analysis platform that could systematically and efficiently analyze, compare and search similar metagenomic samples. In the database part, we have collected more than 7 000 high quality and well annotated metagenomic samples from the public domain and in-house facilities. The analysis platform supplies a list of online tools which could accept metagenomic samples, build taxonomical annotations, compare sample in multiple angle, and then search for similar samples against its database by a fast indexing strategy and scoring function. We also used case studies of "database search for identification" and "samples clustering based on similarity matrix" using human-associated habitat samples to demonstrate the performance of Meta-Mesh in metagenomic analysis. Therefore, Meta-Mesh would serve as a database and data analysis system to quickly parse and identify similar metagenomic samples from a large pool of well annotated samples.
出处
《生物工程学报》
CAS
CSCD
北大核心
2014年第1期6-17,共12页
Chinese Journal of Biotechnology
基金
中国科学院e-Science项目(No.INFO-115-D01-Z006)
国家高技术研究发展计划(863计划)(Nos.2009AA02Z310
2012AA02A707
2014AA21502)
国家自然科学基金(Nos.61103167
31271410
61303161)资助~~
关键词
微生物群落
元基因组
数据库
数据挖掘
在线服务
相似性网络
microbial community, metagenome, database, data mining, online service, similarity network