The Internet of Things(IoT)is a recent technology,which implies the union of objects,“things”,into a single worldwide network.This promising paradigm faces many design challenges associated with the dramatic increas...The Internet of Things(IoT)is a recent technology,which implies the union of objects,“things”,into a single worldwide network.This promising paradigm faces many design challenges associated with the dramatic increase in the number of end-devices.Device identification is one of these challenges that becomes complicated with the increase of network devices.Despite this,there is still no universally accepted method of identifying things that would satisfy all requirements of the existing IoT devices and applications.In this regard,one of the most important problems is choosing an identification system for all IoT devices connected to the public communication networks.Many unique soft-ware and hardware solutions are used as a unique global identifier;however,such solutions have many limitations.This article proposes a novel solution,based on the Digital Object Architecture(DOA),that meets the requirements of identifying devices and applications of the IoT.This work analyzes the benefits of using the DOA as an identification platform in modern telecommunication networks.We propose a model of an identification system based on the architecture of digital objects,which differs from the well-known ones.The proposed model ensures an acceptable quality of service(QoS)in the common architecture of the existing public communication networks.A novel interaction architecture is developed by introducing a Middle Handle Register(MHR)between the global register,i.e.,Global Handle Register(GHR),and local register,i.e.,Local Handle Register(LHR).The aspects of the network interaction and the compatibility of IoT end-devices with the integrated DOA identifiers in heterogeneous communication networks are presented.The developed model is simulated for a wide-area network with allocated registers,and the results are introduced and discussed.展开更多
Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projec...Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projects cannot be carried out and many researchers are excluded.Since about 80%of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data-our methods do not scale.Therefore experts worldwide are looking for strategies and methods that have a potential for the future.The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers(PID)and metadata(MD).In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use.It is argued,however,that assigning PIDs is just the first step.If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed,we are close to defining Digital Objects(DOs)which could indeed indicate a solution to solve some of the basic problems in data management and processing.In addition to standardizing the way we assign PIDs,metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations.We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry.A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.展开更多
A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable,with institutional digitization tending to focus more on i...A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable,with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them.Label data are traditionally manually transcribed today with high cost and low throughput,rendering such a task constrained for many collection-holding institutions at current funding levels.We show how computer vision,optical character recognition,handwriting recognition,named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable,accessible,interoperable,and reusable(FAIR)characteristics.These libraries are being developed in a cloudbased workflow plaform-the Specimen Data Refinery'(SDR)-founded on Galaxy workflow engine,Common Workflow Language,Research Object Crates(RO-Crate)and WorkflowHub technologies.The SDR can be applied to specimens'labels and other artefacts,offering the prospect of greatly accelerated and more accurate data capture in computable form.Two kinds of FAIR Digital Objects(FDO)are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata,a persistent identifier,and a specific type definition.The first kind of FDO are computable Digital Specimen(DS)objects that can be consumed/produced by workflows,and other applications.A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end.The Specimen Data Refinery provides a library of such components that can be used individually,or in series.To cofunction,each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich.The second kind of FDO,RO-Crates gather and archive the diverse set of digital and real-world resources,configurations,and actions(the provenance)contributing to a unit of research work,allowing that work to be faithfully recorded and reproduced.Here we describe the Specimen Data Refinery with its motivating requirements,focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.展开更多
In this paper we present the Reproducible Research Publication Workflow(RRPW)as an example of how generic canonical workflows can be applied to a specific context.The RRPW includes essential steps between submission a...In this paper we present the Reproducible Research Publication Workflow(RRPW)as an example of how generic canonical workflows can be applied to a specific context.The RRPW includes essential steps between submission and final publication of the manuscript and the research artefacts(i.e.,data,code,etc.)that underlie the scholarly claims in the manuscript.A key aspect of the RRPW is the inclusion of artefact review and metadata creation as part of the publication workflow.The paper discusses a formalized technical structure around a set of canonical steps which helps codify and standardize the process for researchers,curators,and publishers.The proposed application of canonical workflows can help achieve the goals of improved transparency and reproducibility,increase FAIR compliance of all research artefacts at all steps,and facilitate better exchange of annotated and machine-readable metadata.展开更多
Purpose:This study attempts to propose an abstract model by gathering concepts that can focus on resource representation and description in a digital curation model and suggest a conceptual model that emphasizes seman...Purpose:This study attempts to propose an abstract model by gathering concepts that can focus on resource representation and description in a digital curation model and suggest a conceptual model that emphasizes semantic enrichment in a digital curation model.Design/methodology/approach:This study conducts a literature review to analyze the preceding curation models,DCC CLM,DCC&U,UC3,and DCN.Findings:The concept of semantic enrichment is expressed in a single word,SEMANTIC in this study.The Semantic Enrichment Model,SEMANTIC has elements,subject,extraction,multi-language,authority,network,thing,identity,and connect.Research limitations:This study does not reflect the actual information environment because it focuses on the concepts of the representation of digital objects.Practical implications:This study presents the main considerations for creating and reinforcing the description and representation of digital objects when building and developing digital curation models in specific institutions.Originality/value:This study summarizes the elements that should be emphasized in the representation of digital objects in terms of information organization.展开更多
Purpose: To develop a structured, rich media digital paper authoring tool with an object- based model that enables interactive, playable, and convertible functions. Design/methodology/approach: We propose Dpaper to ...Purpose: To develop a structured, rich media digital paper authoring tool with an object- based model that enables interactive, playable, and convertible functions. Design/methodology/approach: We propose Dpaper to organize the content (text, data, rich media, etc.) of dissertation papers as XML and HTML5 files by means of digital objects and digital templates. Findings: Dpaper provides a structured-paper editorial platform for the authors of PhDs to organize research materials and to generate various digital paper objects that are playable and reusable. The PhD papers are represented as Web pages and structured XML files, which are marked with semantic tags. Research limitations: The proposed tool only provides access to a limited number of digital objects. For instance, the tool cannot create equations and graphs, and typesetting is not yet flexible compared to MS Word. Practical implications: The Dpaper tool is designed to break through the patterns of unstructured content organization of traditional papers, and makes the paper accessible for not only reading but for exploitation as data, where the document can be extractable and reusable. As a result, Dpaper can make the digital publishing of dissertation texts more flexible and efficient, and their data more assessable. Originality/value: The Dpaper tool solves the challenge of making a paper structured and object-based in the stage of authoring, and has practical values for semantic publishing.展开更多
This paper presents a component object model (COM) based framework for managing, analyzing and visualizing massive multi-scale digital elevation models (DEMs). The framework consists of a data management component (DM...This paper presents a component object model (COM) based framework for managing, analyzing and visualizing massive multi-scale digital elevation models (DEMs). The framework consists of a data management component (DMC), which is based on RDBMS/ORDBMS, a data analysis component (DAC) and a data render component (DRC). DMC can manage massive multi-scale data expressed at various reference frames within a pyramid database and can support fast access to data at variable resolution. DAC integrates many useful applied analytic functions whose results can be overlaid with the 3D scene rendered by DRC. DRC provides view-dependent data paging with the support of the underlying DMC and organizes the potential visible data at different levels into rendering.展开更多
By using a spherical wave as the reference wave, we recorded the in-line phase-shifting digital hologram of the 25th element of Chinese standard No. 3 resolution test pattern, and gave the corresponding numerical reco...By using a spherical wave as the reference wave, we recorded the in-line phase-shifting digital hologram of the 25th element of Chinese standard No. 3 resolution test pattern, and gave the corresponding numerical reconstructed results. Some problems concerning with the digital hologram recording and reconstruction of the diffractive object at a short distance are discussed. The experimental result shows that the resolution of the reconstructed image is better than 10μm, which is the limit by using this experimental arrangement.展开更多
Digital speckle pattern interferometry (DSPI) is a high-precision deformation t technique for planar objects. However, for curved objects, the three-dimensional (3D) shape information is needed in order to obtain ...Digital speckle pattern interferometry (DSPI) is a high-precision deformation t technique for planar objects. However, for curved objects, the three-dimensional (3D) shape information is needed in order to obtain correct deformation measurement in DSPI. Thus, combined shape and deformation measurement techniques of DSPI have been proposed. However, the current techniques are either complex in setup or complicated in operation. Furthermore, the operations of some techniques are too slow for real-time measurement. In this work, we propose a DSPI technique for both 3D shape and out-of-plane deformation measurement. Compared with current techniques, the proposed technique is simple in both setup and operation and is capable of fast deformation measurement. Theoretical analysis and experiments are performed. For a cylinder surface with an arch height of 9 mm, the error of out-of-plane deformation measurement is less than 0.15 μm. The effectiveness of the proposed scheme is verified.展开更多
We examine the intersection of the FAIR principles(Findable,Accessible,Interoperable and Reusable),the challenges and opportunities presented by the aggregation of widely distributed and heterogeneous data about biolo...We examine the intersection of the FAIR principles(Findable,Accessible,Interoperable and Reusable),the challenges and opportunities presented by the aggregation of widely distributed and heterogeneous data about biological and geological specimens,and the use of the Digital Object Architecture(DOA)data model and components as an approach to solving those challenges that offers adherence to the FAIR principles as an integral characteristic.This approach will be prototyped in the Distributed System of Scientific Collections(DiSSCo)project,the pan-European Research Infrastructure which aims to unify over 110 natural science collections across 21 countries.We take each of the FAIR principles,discuss them as requirements in the creation of a seamless virtual collection of bio/geo specimen data,and map those requirements to Digital Object components and facilities such as persistent identification,extended data typing,and the use of an additional level of abstraction to normalize existing heterogeneous data structures.The FAIR principles inform and motivate the work and the DO Architecture provides the technical vision to create the seamless virtual collection vitally needed to address scientific questions of societal importance.展开更多
The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition...The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition,data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices.To overcome these challenges,the Canonical Workflow Frameworks for Research(CWFR)initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof.This standardised approach,with FAIR Digital Objects as anchors,will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it.This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so,highlights several projects that hold promise for the CWFR approaches,including Galaxy,Jupyter Notebook,and RO Crate,and concludes with an assessment of the state of the field and the challenges ahead.展开更多
The introduction of a new technology or innovation is often accompanied by“ups and downs”in its fortunes.Gartner Inc.defined a so-called hype cycle to describe a general pattern that many innovations experience:tech...The introduction of a new technology or innovation is often accompanied by“ups and downs”in its fortunes.Gartner Inc.defined a so-called hype cycle to describe a general pattern that many innovations experience:technology trigger,peak of inflated expectations,trough of disillusionment,slope of enlightenment,and plateau of productivity.This article will compare the ongoing introduction of Open Science(OS)with the hype cycle model and speculate on the relevance of that model to OS.Lest the title of this article mislead the reader,be assured that the author believes that OS should happen and that it will happen.However,I also believe that the path to OS will be longer than many of us had hoped.I will give a brief history of the today’s“semi-open”science,define what I mean by OS,define the hype cycle and where OS is now on that cycle,and finally speculate what it will take to traverse the cycle and rise to its plateau of productivity(as described by Gartner).展开更多
There is a huge gap between(1)the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and(2)the awareness of the FAIR principles and the lack o...There is a huge gap between(1)the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and(2)the awareness of the FAIR principles and the lack of changes in practices during the last 5 years.The CWFR concept has been defined which is meant to combine these two intentions,increasing the use of workflow technology and improving FAIR compliance.In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility.Researchers will only change practices if they can work efficiently and are not loaded with additional tasks.A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation.The researcher is guided by such a framework and information once entered can easily be shared and reused.The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods.Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows.As the Galaxy project indicates,the availability of supporting tools will be important to let researchers use these methods.Other as the Galaxy framework suggests,however,it would be necessary to include allsteps necessary for doing amachine learning task including those that require human interaction andtodocument all phases with the help of structured FDOs.展开更多
The overall expectation of introducing Canonical Workflow for Experimental Research and FAIR digital objects(FDOs)can be summarised as reducing the gap between workflow technology and research practices to make experi...The overall expectation of introducing Canonical Workflow for Experimental Research and FAIR digital objects(FDOs)can be summarised as reducing the gap between workflow technology and research practices to make experimental work more efficient and improve FAIRness without adding administrative load on the researchers.In this document,we will describe,with the help of an example,how CWFR could work in detail and improve research procedures.We have chosen the example of"experiments with human subjects"which stretches from planning an experiment to storing the collected data in a repository.While we focus on experiments with human subjects,we are convinced that CWFR can be applied to many other data generation processes based on experiments.The main challenge is to identify repeating patterns in existing research practices that can be abstracted to create CWFR.In this document,we will include detailed examples from different disciplines to demonstrate that CWFR can be implemented without violating specific disciplinary or methodological requirements.We do not claim to be comprehensive in all aspects,since these examples are meant to prove the concept of CWFR.展开更多
In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulationbased research....In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulationbased research.We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center(DKRZ).What is special about this is that the DKRZ provides the climate science community with resources like high performance computing(HPC),data storage and specialised services,and hosts the World Data Center for Climate(WDCC).Therefore,users can perform their entire research workflows up to the publication of the data on the same infrastructure.Our analysis shows,that the resources are used by two primary user types:those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse,build-on and analyse existing data.We then further subdivided these top-level user categories based on their specific goals and analysed their typical,idealised workflows applied to achieve the respective project goals.We find that due to the subdivision and further granulation of the user groups,the workflows show apparent differences.Nevertheless,similar"Canonical Workflow Modules"can be clearly made out.These modules are"Data and Software(Re)use","Compute","Data and Software Storing","Data and Software Publication","Generating Knowledge"and in their entirety form the basis for a Canonical Workflow Framework for Research(CWFR).It is desirable that parts of the workflows in a CWFR act as FDOs,but we view this aspect critically.Also,we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.展开更多
We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow langua...We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms.The concept is implemented and demonstrated with the BioExcel Building Blocks library(BioBB),a collection of tool wrappers in the field of computational biomolecular simulation.Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow,built using this library and run with 5 different Workflow Manager Systems(WfMS).We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research(CWFR)in order to improve widespread adoption and reuse of computational methods across workflow language barriers.展开更多
Data Science(DS)as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections.“Open Science by Design”(OSD),i.e.,making ...Data Science(DS)as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections.“Open Science by Design”(OSD),i.e.,making artefacts such as data,metadata,models,and algorithms available and re-usable to peers and beyond as early as possible,is a pre-requisite for a flourishing DS landscape.However,a few major aspects can be identified hampering a fast transition:(1)The classical“Open Science by Publication”(OSP)is not sufficient any longer since it serves different functions,leads to non-acceptable delays and is associated with high curation costs.Changing data lab practices towards OSD requires more fundamental changes than OSP.(2)The classical publication-oriented models for metrics,mainly informed by citations,will not work anymore since the roles of contributors are more difficult to assess and will often change,i.e.,other ways for assigning incentives and recognition need to be found.(3)The huge investments in developing DS skills and capacities by some global companies and strong countries is leading to imbalances and fears by different stakeholders hampering the acceptance of Open Science(OS).(4)Finally,OSD will depend on the availability of a global infrastructure fostering an integrated and interoperable data domain-“one data-domain”as George Strawn calls it-which is still not visible due to differences about the technological key pillars.OS therefore is a need for DS,but it will take much more time to implement it than we may have expected.展开更多
文摘The Internet of Things(IoT)is a recent technology,which implies the union of objects,“things”,into a single worldwide network.This promising paradigm faces many design challenges associated with the dramatic increase in the number of end-devices.Device identification is one of these challenges that becomes complicated with the increase of network devices.Despite this,there is still no universally accepted method of identifying things that would satisfy all requirements of the existing IoT devices and applications.In this regard,one of the most important problems is choosing an identification system for all IoT devices connected to the public communication networks.Many unique soft-ware and hardware solutions are used as a unique global identifier;however,such solutions have many limitations.This article proposes a novel solution,based on the Digital Object Architecture(DOA),that meets the requirements of identifying devices and applications of the IoT.This work analyzes the benefits of using the DOA as an identification platform in modern telecommunication networks.We propose a model of an identification system based on the architecture of digital objects,which differs from the well-known ones.The proposed model ensures an acceptable quality of service(QoS)in the common architecture of the existing public communication networks.A novel interaction architecture is developed by introducing a Middle Handle Register(MHR)between the global register,i.e.,Global Handle Register(GHR),and local register,i.e.,Local Handle Register(LHR).The aspects of the network interaction and the compatibility of IoT end-devices with the integrated DOA identifiers in heterogeneous communication networks are presented.The developed model is simulated for a wide-area network with allocated registers,and the results are introduced and discussed.
文摘Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projects cannot be carried out and many researchers are excluded.Since about 80%of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data-our methods do not scale.Therefore experts worldwide are looking for strategies and methods that have a potential for the future.The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers(PID)and metadata(MD).In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use.It is argued,however,that assigning PIDs is just the first step.If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed,we are close to defining Digital Objects(DOs)which could indeed indicate a solution to solve some of the basic problems in data management and processing.In addition to standardizing the way we assign PIDs,metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations.We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry.A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.
基金funding from the European Union's Horizon 2020 research and innovation programme under grant agreement numbers 823827(SYNTHESYS Plus),871043(DisSCo Prepare),823830(BioExcel-2),824087(EOSC-Life).
文摘A key limiting factor in organising and using information from physical specimens curated in natural science collections is making that information computable,with institutional digitization tending to focus more on imaging the specimens themselves than on efficiently capturing computable data about them.Label data are traditionally manually transcribed today with high cost and low throughput,rendering such a task constrained for many collection-holding institutions at current funding levels.We show how computer vision,optical character recognition,handwriting recognition,named entity recognition and language translation technologies can be implemented into canonical workflow component libraries with findable,accessible,interoperable,and reusable(FAIR)characteristics.These libraries are being developed in a cloudbased workflow plaform-the Specimen Data Refinery'(SDR)-founded on Galaxy workflow engine,Common Workflow Language,Research Object Crates(RO-Crate)and WorkflowHub technologies.The SDR can be applied to specimens'labels and other artefacts,offering the prospect of greatly accelerated and more accurate data capture in computable form.Two kinds of FAIR Digital Objects(FDO)are created by packaging outputs of SDR workflows and workflow components as digital objects with metadata,a persistent identifier,and a specific type definition.The first kind of FDO are computable Digital Specimen(DS)objects that can be consumed/produced by workflows,and other applications.A single DS is the input data structure submitted to a workflow that is modified by each workflow component in turn to produce a refined DS at the end.The Specimen Data Refinery provides a library of such components that can be used individually,or in series.To cofunction,each library component describes the fields it requires from the DS and the fields it will in turn populate or enrich.The second kind of FDO,RO-Crates gather and archive the diverse set of digital and real-world resources,configurations,and actions(the provenance)contributing to a unit of research work,allowing that work to be faithfully recorded and reproduced.Here we describe the Specimen Data Refinery with its motivating requirements,focusing on what is essential in the creation of canonical workflow component libraries and its conformance with the requirements of an emerging FDO Core Specification being developed by the FDO Forum.
基金funding from the Institute of Museum and Library Services(RE-36-19-0081-19).
文摘In this paper we present the Reproducible Research Publication Workflow(RRPW)as an example of how generic canonical workflows can be applied to a specific context.The RRPW includes essential steps between submission and final publication of the manuscript and the research artefacts(i.e.,data,code,etc.)that underlie the scholarly claims in the manuscript.A key aspect of the RRPW is the inclusion of artefact review and metadata creation as part of the publication workflow.The paper discusses a formalized technical structure around a set of canonical steps which helps codify and standardize the process for researchers,curators,and publishers.The proposed application of canonical workflows can help achieve the goals of improved transparency and reproducibility,increase FAIR compliance of all research artefacts at all steps,and facilitate better exchange of annotated and machine-readable metadata.
基金supported by a research grant from Seoul Women’s University(2020)financially supported by Hansung University
文摘Purpose:This study attempts to propose an abstract model by gathering concepts that can focus on resource representation and description in a digital curation model and suggest a conceptual model that emphasizes semantic enrichment in a digital curation model.Design/methodology/approach:This study conducts a literature review to analyze the preceding curation models,DCC CLM,DCC&U,UC3,and DCN.Findings:The concept of semantic enrichment is expressed in a single word,SEMANTIC in this study.The Semantic Enrichment Model,SEMANTIC has elements,subject,extraction,multi-language,authority,network,thing,identity,and connect.Research limitations:This study does not reflect the actual information environment because it focuses on the concepts of the representation of digital objects.Practical implications:This study presents the main considerations for creating and reinforcing the description and representation of digital objects when building and developing digital curation models in specific institutions.Originality/value:This study summarizes the elements that should be emphasized in the representation of digital objects in terms of information organization.
基金the Chinese Academy of Sciences for the Rich Media Digital Dissertation Authoring Tools iDissertation project
文摘Purpose: To develop a structured, rich media digital paper authoring tool with an object- based model that enables interactive, playable, and convertible functions. Design/methodology/approach: We propose Dpaper to organize the content (text, data, rich media, etc.) of dissertation papers as XML and HTML5 files by means of digital objects and digital templates. Findings: Dpaper provides a structured-paper editorial platform for the authors of PhDs to organize research materials and to generate various digital paper objects that are playable and reusable. The PhD papers are represented as Web pages and structured XML files, which are marked with semantic tags. Research limitations: The proposed tool only provides access to a limited number of digital objects. For instance, the tool cannot create equations and graphs, and typesetting is not yet flexible compared to MS Word. Practical implications: The Dpaper tool is designed to break through the patterns of unstructured content organization of traditional papers, and makes the paper accessible for not only reading but for exploitation as data, where the document can be extractable and reusable. As a result, Dpaper can make the digital publishing of dissertation texts more flexible and efficient, and their data more assessable. Originality/value: The Dpaper tool solves the challenge of making a paper structured and object-based in the stage of authoring, and has practical values for semantic publishing.
文摘This paper presents a component object model (COM) based framework for managing, analyzing and visualizing massive multi-scale digital elevation models (DEMs). The framework consists of a data management component (DMC), which is based on RDBMS/ORDBMS, a data analysis component (DAC) and a data render component (DRC). DMC can manage massive multi-scale data expressed at various reference frames within a pyramid database and can support fast access to data at variable resolution. DAC integrates many useful applied analytic functions whose results can be overlaid with the 3D scene rendered by DRC. DRC provides view-dependent data paging with the support of the underlying DMC and organizes the potential visible data at different levels into rendering.
文摘By using a spherical wave as the reference wave, we recorded the in-line phase-shifting digital hologram of the 25th element of Chinese standard No. 3 resolution test pattern, and gave the corresponding numerical reconstructed results. Some problems concerning with the digital hologram recording and reconstruction of the diffractive object at a short distance are discussed. The experimental result shows that the resolution of the reconstructed image is better than 10μm, which is the limit by using this experimental arrangement.
基金supported by the National Key Research and Development Project of China(No.2016YFF0200700)the National Natural Science Foundation of China(No.61405111)
文摘Digital speckle pattern interferometry (DSPI) is a high-precision deformation t technique for planar objects. However, for curved objects, the three-dimensional (3D) shape information is needed in order to obtain correct deformation measurement in DSPI. Thus, combined shape and deformation measurement techniques of DSPI have been proposed. However, the current techniques are either complex in setup or complicated in operation. Furthermore, the operations of some techniques are too slow for real-time measurement. In this work, we propose a DSPI technique for both 3D shape and out-of-plane deformation measurement. Compared with current techniques, the proposed technique is simple in both setup and operation and is capable of fast deformation measurement. Theoretical analysis and experiments are performed. For a cylinder surface with an arch height of 9 mm, the error of out-of-plane deformation measurement is less than 0.15 μm. The effectiveness of the proposed scheme is verified.
文摘We examine the intersection of the FAIR principles(Findable,Accessible,Interoperable and Reusable),the challenges and opportunities presented by the aggregation of widely distributed and heterogeneous data about biological and geological specimens,and the use of the Digital Object Architecture(DOA)data model and components as an approach to solving those challenges that offers adherence to the FAIR principles as an integral characteristic.This approach will be prototyped in the Distributed System of Scientific Collections(DiSSCo)project,the pan-European Research Infrastructure which aims to unify over 110 natural science collections across 21 countries.We take each of the FAIR principles,discuss them as requirements in the creation of a seamless virtual collection of bio/geo specimen data,and map those requirements to Digital Object components and facilities such as persistent identification,extended data typing,and the use of an additional level of abstraction to normalize existing heterogeneous data structures.The FAIR principles inform and motivate the work and the DO Architecture provides the technical vision to create the seamless virtual collection vitally needed to address scientific questions of societal importance.
文摘The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition,data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices.To overcome these challenges,the Canonical Workflow Frameworks for Research(CWFR)initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof.This standardised approach,with FAIR Digital Objects as anchors,will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it.This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so,highlights several projects that hold promise for the CWFR approaches,including Galaxy,Jupyter Notebook,and RO Crate,and concludes with an assessment of the state of the field and the challenges ahead.
文摘The introduction of a new technology or innovation is often accompanied by“ups and downs”in its fortunes.Gartner Inc.defined a so-called hype cycle to describe a general pattern that many innovations experience:technology trigger,peak of inflated expectations,trough of disillusionment,slope of enlightenment,and plateau of productivity.This article will compare the ongoing introduction of Open Science(OS)with the hype cycle model and speculate on the relevance of that model to OS.Lest the title of this article mislead the reader,be assured that the author believes that OS should happen and that it will happen.However,I also believe that the path to OS will be longer than many of us had hoped.I will give a brief history of the today’s“semi-open”science,define what I mean by OS,define the hype cycle and where OS is now on that cycle,and finally speculate what it will take to traverse the cycle and rise to its plateau of productivity(as described by Gartner).
文摘There is a huge gap between(1)the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and(2)the awareness of the FAIR principles and the lack of changes in practices during the last 5 years.The CWFR concept has been defined which is meant to combine these two intentions,increasing the use of workflow technology and improving FAIR compliance.In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility.Researchers will only change practices if they can work efficiently and are not loaded with additional tasks.A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation.The researcher is guided by such a framework and information once entered can easily be shared and reused.The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods.Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows.As the Galaxy project indicates,the availability of supporting tools will be important to let researchers use these methods.Other as the Galaxy framework suggests,however,it would be necessary to include allsteps necessary for doing amachine learning task including those that require human interaction andtodocument all phases with the help of structured FDOs.
文摘The overall expectation of introducing Canonical Workflow for Experimental Research and FAIR digital objects(FDOs)can be summarised as reducing the gap between workflow technology and research practices to make experimental work more efficient and improve FAIRness without adding administrative load on the researchers.In this document,we will describe,with the help of an example,how CWFR could work in detail and improve research procedures.We have chosen the example of"experiments with human subjects"which stretches from planning an experiment to storing the collected data in a repository.While we focus on experiments with human subjects,we are convinced that CWFR can be applied to many other data generation processes based on experiments.The main challenge is to identify repeating patterns in existing research practices that can be abstracted to create CWFR.In this document,we will include detailed examples from different disciplines to demonstrate that CWFR can be implemented without violating specific disciplinary or methodological requirements.We do not claim to be comprehensive in all aspects,since these examples are meant to prove the concept of CWFR.
基金funded by the Deutsche Forschungsgemeinschaft(DFG,German Research Foundation)under Germany's Excellence Strategy-EXC 2037 CLICCS-Climate,Climatic Change,and Society-Project No.390683824.
文摘In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulationbased research.We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center(DKRZ).What is special about this is that the DKRZ provides the climate science community with resources like high performance computing(HPC),data storage and specialised services,and hosts the World Data Center for Climate(WDCC).Therefore,users can perform their entire research workflows up to the publication of the data on the same infrastructure.Our analysis shows,that the resources are used by two primary user types:those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse,build-on and analyse existing data.We then further subdivided these top-level user categories based on their specific goals and analysed their typical,idealised workflows applied to achieve the respective project goals.We find that due to the subdivision and further granulation of the user groups,the workflows show apparent differences.Nevertheless,similar"Canonical Workflow Modules"can be clearly made out.These modules are"Data and Software(Re)use","Compute","Data and Software Storing","Data and Software Publication","Generating Knowledge"and in their entirety form the basis for a Canonical Workflow Framework for Research(CWFR).It is desirable that parts of the workflows in a CWFR act as FDOs,but we view this aspect critically.Also,we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.
基金a project funded by the European Union contracts H2020-INFRAEDI-02-2018823830,and H2020-EINFRA-2015-1675728funded through EOSC-Life(https://www.eosc-life.eu)contract H2020-INFRAEOSC-2018-2824087ELIXIR-CONVERGE(https://elixir-europe.org)contract H2020-INFRADEV-2019-2871075.
文摘We introduce the concept of Canonical Workflow Building Blocks(CWBB),a methodology of describing and wrapping computational tools,in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms.The concept is implemented and demonstrated with the BioExcel Building Blocks library(BioBB),a collection of tool wrappers in the field of computational biomolecular simulation.Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow,built using this library and run with 5 different Workflow Manager Systems(WfMS).We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research(CWFR)in order to improve widespread adoption and reuse of computational methods across workflow language barriers.
文摘Data Science(DS)as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections.“Open Science by Design”(OSD),i.e.,making artefacts such as data,metadata,models,and algorithms available and re-usable to peers and beyond as early as possible,is a pre-requisite for a flourishing DS landscape.However,a few major aspects can be identified hampering a fast transition:(1)The classical“Open Science by Publication”(OSP)is not sufficient any longer since it serves different functions,leads to non-acceptable delays and is associated with high curation costs.Changing data lab practices towards OSD requires more fundamental changes than OSP.(2)The classical publication-oriented models for metrics,mainly informed by citations,will not work anymore since the roles of contributors are more difficult to assess and will often change,i.e.,other ways for assigning incentives and recognition need to be found.(3)The huge investments in developing DS skills and capacities by some global companies and strong countries is leading to imbalances and fears by different stakeholders hampering the acceptance of Open Science(OS).(4)Finally,OSD will depend on the availability of a global infrastructure fostering an integrated and interoperable data domain-“one data-domain”as George Strawn calls it-which is still not visible due to differences about the technological key pillars.OS therefore is a need for DS,but it will take much more time to implement it than we may have expected.