"结构化数字对象"(Structured Digital Object,SDO)的概念是表现学术信息的手法之一,它融合地表现了信息的"粒度、结构、内容含义"三个重要特性;作为SDO的计算机实现方式,可采用XML进行描述(简称SDO/XML),SDO/XML..."结构化数字对象"(Structured Digital Object,SDO)的概念是表现学术信息的手法之一,它融合地表现了信息的"粒度、结构、内容含义"三个重要特性;作为SDO的计算机实现方式,可采用XML进行描述(简称SDO/XML),SDO/XML被证明是Web时代知识信息结构化组织的一种可行方式[1]。SDO的概念于2002年在国家图书馆主办的首届数字图书馆国际会议上[2]正式发表;SDO/XML的概念模式及在数字图书馆等领域的应用研究,作为原创论文,于2003年1月在日本"情报知识学会"[3]杂志上刊载;其在元数据领域的应用探讨,则于2004年在上海召开的Dublin Core会议[4]上得到承认。作为后续研究应用,我们将这一原理,具体应用于中文DOI系统中,基于SDO/XML思想设计DOI元数据XML Schema,实现了中文DOI系统核心功能——DOI的注册、解析与元数据查询。本文介绍SDO和SDO的描述方式,即SDO/XML的概念;简要阐述SDO/XML在数字资源检索系统、数字期刊出版系统和数字图书馆系统等领域的应用;给出基于SDO/XML概念的"Global Digital Library"(GDL,全球数字图书馆)的模式和其运作方式,探讨数字图书馆的互操作课题;作为SDO/XML的最新应用,较为详细地介绍SDO在中文DOI系统中的应用;最后,给出结论和思考,概述探讨未来的研究方向。展开更多
Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projec...Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projects cannot be carried out and many researchers are excluded.Since about 80%of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data-our methods do not scale.Therefore experts worldwide are looking for strategies and methods that have a potential for the future.The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers(PID)and metadata(MD).In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use.It is argued,however,that assigning PIDs is just the first step.If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed,we are close to defining Digital Objects(DOs)which could indeed indicate a solution to solve some of the basic problems in data management and processing.In addition to standardizing the way we assign PIDs,metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations.We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry.A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.展开更多
文摘"结构化数字对象"(Structured Digital Object,SDO)的概念是表现学术信息的手法之一,它融合地表现了信息的"粒度、结构、内容含义"三个重要特性;作为SDO的计算机实现方式,可采用XML进行描述(简称SDO/XML),SDO/XML被证明是Web时代知识信息结构化组织的一种可行方式[1]。SDO的概念于2002年在国家图书馆主办的首届数字图书馆国际会议上[2]正式发表;SDO/XML的概念模式及在数字图书馆等领域的应用研究,作为原创论文,于2003年1月在日本"情报知识学会"[3]杂志上刊载;其在元数据领域的应用探讨,则于2004年在上海召开的Dublin Core会议[4]上得到承认。作为后续研究应用,我们将这一原理,具体应用于中文DOI系统中,基于SDO/XML思想设计DOI元数据XML Schema,实现了中文DOI系统核心功能——DOI的注册、解析与元数据查询。本文介绍SDO和SDO的描述方式,即SDO/XML的概念;简要阐述SDO/XML在数字资源检索系统、数字期刊出版系统和数字图书馆系统等领域的应用;给出基于SDO/XML概念的"Global Digital Library"(GDL,全球数字图书馆)的模式和其运作方式,探讨数字图书馆的互操作课题;作为SDO/XML的最新应用,较为详细地介绍SDO在中文DOI系统中的应用;最后,给出结论和思考,概述探讨未来的研究方向。
文摘Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projects cannot be carried out and many researchers are excluded.Since about 80%of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data-our methods do not scale.Therefore experts worldwide are looking for strategies and methods that have a potential for the future.The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers(PID)and metadata(MD).In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use.It is argued,however,that assigning PIDs is just the first step.If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed,we are close to defining Digital Objects(DOs)which could indeed indicate a solution to solve some of the basic problems in data management and processing.In addition to standardizing the way we assign PIDs,metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations.We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry.A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.