摘要
现代生物学不仅是一门实验科学,也是一门数据科学。为了增强生物数据的FAIR(Findable,Accessible,Interoperable,Reusable)特性,使它们能够为人、为机器所用,我们就必须推行数据的标准化。数据的标准化有很多实现手段,但对于生物学数据而言,目前只有对实验结果进行标准化最为可行。存放实验结果的数据库虽然都由数据管理系统、转换层和数据界面这三部分组成,但由于各数据库建立的目的和服务的对象不同,不同数据库的相同部分也常常遵循着不同的标准,这大大降低了数据可以被自动化处理的能力。使用统一的数据元素集建立在学科范围内具有普遍适用性的数据标准可以大大提高数据的可互操作性。本文以微生物的数据标准为例,在生物多样性、菌株资源和序列数据3个不同的层面介绍了相关数据标准并讨论了生物数据标准应达到的目的和应具有的特性,并通过《微生物资源中心数据管理和发布规范ISO 21710:2020》简要说明了建立微生物数据标准的方法。在最后我们列举了当今生物数据标准化所面临的一些困难以及可能的解决途径。
Modern biology is not only an experimental science but also a data science.In order to produce human-and machine-actionable data,increasing the FAIRability of existing and future data,we need to introduce data standardization.Among many approaches of data standardization,standardization of experimental results is the most practical approach in biological studies.Although all databases consist of a three-tier system(database management system,conversion software,and browser interface),each of them follows a unique way to construct their parts because of their own purpose and user community.This greatly lowers the interoperability of the data.Using comparatively uniform data element standards within each domain can largely counter this problem.In this article,we demonstrated the standardization process of biological data via microbial data standards that are introduced from biodiversity,microbial resource and sequence levels,respectively.We also introduced an international standard,ISO 21710:2020,to showcase how to establish a data standard for microbial resource center.Several issues that may hamper data standardization in biology are discussed in the end.
作者
孙定中
马俊才
SUN Dingzhong;MA Juncai(Microbial Resource and Big Data Center,Institute of Microbiology,Chinese Academy of Sciences,Beijing 100101,P.R.China)