In this paper, we study resource management models and algorithms that satisfy multiple performance objects simultaneously. We realize the proportional fairness principle based QoS model, which defines both delay and ...In this paper, we study resource management models and algorithms that satisfy multiple performance objects simultaneously. We realize the proportional fairness principle based QoS model, which defines both delay and loss rate requirements of a class, to include fairness, which is important for the integration of multiple service classes. The resulting Proportional Fairness Scheduling model formalizes the goals of the network performance, user’s QoS requirement and system fairness and exposes the fundamental tradeoffs between these goals. In particular, it is difficult to simultaneously provide these objects. We propose a novel scheduling algorithm called Proportional Fairness Scheduling (PFS) that approximates the model closely and efficiently. We have implemented the PFS scheduling in Linux. By performing simulation and measurement experiments, we evaluate the delay and loss rate proportional fairness of PFS, and determine the computation overhead.展开更多
The FAIR principles have been widely cited,endorsed and adopted by a broad range of stakeholders since their publication in 2016.By intention,the 15 FAIR guiding principles do not dictate specific technological implem...The FAIR principles have been widely cited,endorsed and adopted by a broad range of stakeholders since their publication in 2016.By intention,the 15 FAIR guiding principles do not dictate specific technological implementations,but provide guidance for improving Findability,Accessibility,Interoperability and Reusability of digital resources.This has likely contributed to the broad adoption of the FAIR principles,because individual stakeholder communities can implement their own FAIR solutions.However,it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations.Thus,while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways,for true interoperability we need to support convergence in implementation choices that are widely accessible and(re)-usable.We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible,robust,widespread and consistent FAIR implementations.Any self-identified stakeholder community may either choose to reuse solutions from existing implementations,or when they spot a gap,accept the challenge to create the needed solution,which,ideally,can be used again by other communities in the future.Here,we provide interpretations and implementation considerations(choices and challenges)for each FAIR principle.展开更多
The FAIR principles were received with broad acceptance in several scientific communities.However,there is still some degree of uncertainty on how they should be implemented.Several self-report questionnaires have bee...The FAIR principles were received with broad acceptance in several scientific communities.However,there is still some degree of uncertainty on how they should be implemented.Several self-report questionnaires have been proposed to assess the implementation of the FAIR principles.Moreover,the FAIRmetrics group released 14,general-purpose maturity for representing FAIRness.Initially,these metrics were conducted as open-answer questionnaires.Recently,these metrics have been implemented into a software that can automatically harvest metadata from metadata providers and generate a principle-specific FAIRness evaluation.With so many different approaches for FAIRness evaluations,we believe that further clarification on their limitations and advantages,as well as on their interpretation and interplay should be considered.展开更多
The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperabl...The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperable, Reusable) data principles. While these principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area. This research gap is the focus of our study, which begins with an examination of existing literature to underline the importance of FAIR principles in managing data for LLM training. Building upon this, we propose a novel frame-work designed to integrate FAIR principles into the LLM development lifecycle. A contribution of our work is the development of a comprehensive checklist intended to guide researchers and developers in applying FAIR data principles consistently across the model development process. The utility and effectiveness of our frame-work are validated through a case study on creating a FAIR-compliant dataset aimed at detecting and mitigating biases in LLMs. We present this framework to the community as a tool to foster the creation of technologically advanced, ethically grounded, and socially responsible AI models.展开更多
In a world awash with fragmented data and tools,the notion of Open Science has been gaining a lot of momentum,but simultaneously,it caused a great deal of anxiety.Some of the anxiety may be related to crumbling kingdo...In a world awash with fragmented data and tools,the notion of Open Science has been gaining a lot of momentum,but simultaneously,it caused a great deal of anxiety.Some of the anxiety may be related to crumbling kingdoms,but there are also very legitimate concerns,especially about the relative role of machines and algorithms as compared to humans and the combination of both(i.e.,social machines).There are also grave concerns about the connotations of the term“open”,but also regarding the unwanted side effects as well as the scalability of the approaches advocated by early adopters of new methodological developments.Many of these concerns are associated with mind-machine interaction and the critical role that computers are now playing in our day to day scientific practice.Here we address a number of these concerns and provide some possible solutions.FAIR(machine-actionable)data and services are obviously at the core of Open Science(or rather FAIR science).The scalable and transparent routing of data,tools and compute(to run the tools on)is a key central feature of the envisioned Internet of FAIR Data and Services(IFDS).Both the European Commission in its Declaration on the European Open Science Cloud,the G7,and the USA data commons have identified the need to ensure a solid and sustainable infrastructure for Open Science.Here we first define the term FAIR science as opposed to Open Science.In FAIR science,data and the associated tools are all Findable,Accessible under well defined conditions,Interoperable and Reusable,but not necessarily“open”;without restrictions and certainly not always“gratis”.The ambiguous term“open”has already caused considerable confusion and also opt-out reactions from researchers and other data-intensive professionals who cannot make their data open for very good reasons,such as patient privacy or national security.Although Open Science is a definition for a way of working rather than explicitly requesting for all data to be available in full Open Access, the connotation of openness of the data involved in Open Science is very strong. In FAIR science, data and the associated services to run all processes in the data stewardship cycle from design of experiment to capture to curation, processing, linking and analytics all have minimally FAIR metadata, which specify the conditions under which the actual underlying research objects are reusable, first for machines and then also for humans. This effectively means that-properly conducted- Open Science is part of FAIR science. However, FAIR science can also be done with partly closed, sensitive and proprietary data. As has been emphasized before, FAIR is not identical to “open”. In FAIR/Open Science, data should be as open as possible and as closed as necessary. Where data are generated using public funding, the default will usually be that for the FAIR data resulting from the study the accessibility will be as high as possible, and that more restrictive access and licensing policies on these data will have to be explicitly justified and described. In all cases, however, even if the reuse is restricted, data and related services should be findable for their major uses, machines, which will make them also much better findable for human users. With a tendency to make good data stewardship the norm, a very significant new market for distributed data analytics and learning is opening and a plethora of tools and reusable data objects are being developed and released. These all need FAIR metadata to be routed to each other and to be effective.展开更多
The FAIR principles,an acronym for Findable,Accessible,Interoperable and Reusable,are recognised worldwide as key elements for good practice in all data management processes.To understand how the Brazilian scientific ...The FAIR principles,an acronym for Findable,Accessible,Interoperable and Reusable,are recognised worldwide as key elements for good practice in all data management processes.To understand how the Brazilian scientific community is adhering to these principles,this article reports Brazilian adherence to the GO FAIR initiative through the creation of the GO FAIR Brazil Office and the manner in which they create their implementation networks.To contextualise this understanding,we provide a brief presentation of open data policies in Brazilian research and government,and finally,we describe a model that has been adopted for the GO FAIR Brazil implementation networks.The Brazilian Institute of Information in Science and Technology is responsible for the GO FAIR Brazil Office,which operates in all fields of knowledge and supports thematic implementation networks.Today,GO FAIR Brazil-Health is the first active implementation network in operation,which works in all health domains,serving as a model for other fields like agriculture,nuclear energy,and digital humanities,which are in the process of adherence negotiation.This report demonstrates the strong interest and effort from the Brazilian scientific communities in implementing the FAIR principles in their research data management practices.展开更多
The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition...The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition,data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices.To overcome these challenges,the Canonical Workflow Frameworks for Research(CWFR)initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof.This standardised approach,with FAIR Digital Objects as anchors,will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it.This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so,highlights several projects that hold promise for the CWFR approaches,including Galaxy,Jupyter Notebook,and RO Crate,and concludes with an assessment of the state of the field and the challenges ahead.展开更多
Metadata,data about other digital objects,play an important role in FAIR with a direct relation to all FAIR principles.In this paper we present and discuss the FAIR Data Point(FDP),a software architecture aiming to de...Metadata,data about other digital objects,play an important role in FAIR with a direct relation to all FAIR principles.In this paper we present and discuss the FAIR Data Point(FDP),a software architecture aiming to define a common approach to publish semantically-rich and machine-actionable metadata according to the FAIR principles.We present the core components and features of the FDP,its approach to metadata provision,the criteria to evaluate whether an application adheres to the FDP specifications and the service to register,index and allow users to search for metadata content of available FDPs.展开更多
The open science movement has gained significant momentum within the last few years.This comes along with the need to store and share research artefacts,such as publications and research data.For this purpose,research...The open science movement has gained significant momentum within the last few years.This comes along with the need to store and share research artefacts,such as publications and research data.For this purpose,research repositories need to be established.A variety of solutions exist for implementing such repositories,covering diverse features,ranging from custom depositing workflows to social media-like functions.In this article,we introduce the FAIREST principles,a framework inspired by the well-known FAIR principles,but designed to provide a set of metrics for assessing and selecting solutions for creating digital repositories for research artefacts.The goal is to support decision makers in choosing such a solution when planning for a repository,especially at an institutional level.The metrics included are therefore based on two pillars:(1)an analysis of established features and functionalities,drawn from existing dedicated,general purpose and commonly used solutions,and(2)a literature review on general requirements for digital repositories for research artefacts and related systems.We further describe an assessment of 11 widespread solutions,with the goal to provide an overview of the current landscape of research data repository solutions,identifying gaps and research challenges to be addressed.展开更多
The investigation proposes the application of an ontological semantic approach to describing workflow control patterns,research workflow step patterns,and the meaning of the workflows in terms of domain knowledge.The ...The investigation proposes the application of an ontological semantic approach to describing workflow control patterns,research workflow step patterns,and the meaning of the workflows in terms of domain knowledge.The approach can provide wide opportunities for semantic refinement,reuse,and composition of workflows.Automatic reasoning allows verifying those compositions and implementations and provides machine-actionable workflow manipulation and problem-solving using workflows.The described approach can take into account the implementation of workflows in different workflow management systems,the organization of workflows collections in data infrastructures and the search for them,the semantic approach to the selection of workflows and resources in the research domain,the creation of research step patterns and their implementation reusing fragments of existing workflows,the possibility of automation of problemsolving based on the reuse of workflows.The application of the approach to CWFR conceptions is proposed.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.90104002,69725003)
文摘In this paper, we study resource management models and algorithms that satisfy multiple performance objects simultaneously. We realize the proportional fairness principle based QoS model, which defines both delay and loss rate requirements of a class, to include fairness, which is important for the integration of multiple service classes. The resulting Proportional Fairness Scheduling model formalizes the goals of the network performance, user’s QoS requirement and system fairness and exposes the fundamental tradeoffs between these goals. In particular, it is difficult to simultaneously provide these objects. We propose a novel scheduling algorithm called Proportional Fairness Scheduling (PFS) that approximates the model closely and efficiently. We have implemented the PFS scheduling in Linux. By performing simulation and measurement experiments, we evaluate the delay and loss rate proportional fairness of PFS, and determine the computation overhead.
基金The work of A.Jacobsen,C.Evelo,M.Thompson,R.Cornet,R.Kaliyaperuma and M.Roos is supported by funding from the European Union’s Horizon 2020 research and innovation program under the EJP RD COFUND-EJP N°825575.The work of A.Jacobsen,C.Evelo,C.Goble,M.Thompson,N.Juty,R.Hooft,M.Roos,S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista is supported by funding from ELIXIR EXCELERATE,H2020 grant agreement number 676559.R.Hooft was further funded by NL NWO NRGWI.obrug.2018.009.N.Juty and C.Goble were funded by CORBEL(H2020 grant agreement 654248)N.Juty,C.Goble,S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista were funded by FAIRplus(IMI grant agreement 802750)+13 种基金N.Juty,C.Goble,M.Thompson,M.Roos,S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista were funded by EOSClife H2020-EU(grant agreement number 824087)C.Goble was funded by DMMCore(BBSRC BB/M013189/)M.Thompson,M.Roos received funding from NWO(VWData 400.17.605)S-A.Sansone,P.McQuilton,P.Rocca-Serra and D.Batista have been funded by grants awarded to S-A.Sansone from the UK BBSRC and Research Councils(BB/L024101/1,BB/L005069/1)EU(H2020-EU 634107H2020-EU 654241,IMI(IMPRiND 116060)NIH Data Common Fund,and from the Wellcome Trust(ISA-InterMine 212930/Z/18/ZFAIRsharing 208381/A/17/Z)The work of A.Waagmeester has been funded by grant award number GM089820 from the National Institutes of Health.M.Kersloot was funded by the European Regional Development Fund(KVW-00163).The work of N.Meyers was funded by the National Science Foundation(OAC 1839030)The work of M.D.Wilkinson is funded by Isaac Peral/Marie Curie cofund with the Universidad Politecnica de Madrid and the Ministerio de Economia y Competitividad grant number TIN2014-55993-RMThe work of B.Magagna,E.Schultes,L.da Silva Santos and K.Jeffery is funded by the H2020-EU 824068The work of B.Magagna,E.Schultes and L.da Silva Santos is funded by the GO FAIR ISCO grant of the Dutch Ministry of Science and CultureThe work of G.Guizzardi is supported by the OCEAN Project(FUB).M.Courtot received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No.802750.R.Cornet was further funded by FAIR4Health(H2020-EU grant agreement number 824666)K.Jeffery received funding from EPOS-IP H2020-EU agreement 676564 and ENVRIplus H2020-EU agreement 654182.
文摘The FAIR principles have been widely cited,endorsed and adopted by a broad range of stakeholders since their publication in 2016.By intention,the 15 FAIR guiding principles do not dictate specific technological implementations,but provide guidance for improving Findability,Accessibility,Interoperability and Reusability of digital resources.This has likely contributed to the broad adoption of the FAIR principles,because individual stakeholder communities can implement their own FAIR solutions.However,it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations.Thus,while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways,for true interoperability we need to support convergence in implementation choices that are widely accessible and(re)-usable.We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible,robust,widespread and consistent FAIR implementations.Any self-identified stakeholder community may either choose to reuse solutions from existing implementations,or when they spot a gap,accept the challenge to create the needed solution,which,ideally,can be used again by other communities in the future.Here,we provide interpretations and implementation considerations(choices and challenges)for each FAIR principle.
基金M.Dumontier was supported by grants from NWO(400.17.605628.011.011)+5 种基金NIH(3OT3TR002027-01S11OT3OD025467-011OT3OD025464-01)H2020-EU EOSClife(824087)ELIXIR,the research infrastructure for life-science data.R.de Miranda Azevedo was supported by grants from H2020-EU EOSClife(824087)ELIXIR,the research infrastructure for life-science data.
文摘The FAIR principles were received with broad acceptance in several scientific communities.However,there is still some degree of uncertainty on how they should be implemented.Several self-report questionnaires have been proposed to assess the implementation of the FAIR principles.Moreover,the FAIRmetrics group released 14,general-purpose maturity for representing FAIRness.Initially,these metrics were conducted as open-answer questionnaires.Recently,these metrics have been implemented into a software that can automatically harvest metadata from metadata providers and generate a principle-specific FAIRness evaluation.With so many different approaches for FAIRness evaluations,we believe that further clarification on their limitations and advantages,as well as on their interpretation and interplay should be considered.
文摘The rapid evolution of Large Language Models(LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR(Findable, Accessible, Interoperable, Reusable) data principles. While these principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area. This research gap is the focus of our study, which begins with an examination of existing literature to underline the importance of FAIR principles in managing data for LLM training. Building upon this, we propose a novel frame-work designed to integrate FAIR principles into the LLM development lifecycle. A contribution of our work is the development of a comprehensive checklist intended to guide researchers and developers in applying FAIR data principles consistently across the model development process. The utility and effectiveness of our frame-work are validated through a case study on creating a FAIR-compliant dataset aimed at detecting and mitigating biases in LLMs. We present this framework to the community as a tool to foster the creation of technologically advanced, ethically grounded, and socially responsible AI models.
文摘In a world awash with fragmented data and tools,the notion of Open Science has been gaining a lot of momentum,but simultaneously,it caused a great deal of anxiety.Some of the anxiety may be related to crumbling kingdoms,but there are also very legitimate concerns,especially about the relative role of machines and algorithms as compared to humans and the combination of both(i.e.,social machines).There are also grave concerns about the connotations of the term“open”,but also regarding the unwanted side effects as well as the scalability of the approaches advocated by early adopters of new methodological developments.Many of these concerns are associated with mind-machine interaction and the critical role that computers are now playing in our day to day scientific practice.Here we address a number of these concerns and provide some possible solutions.FAIR(machine-actionable)data and services are obviously at the core of Open Science(or rather FAIR science).The scalable and transparent routing of data,tools and compute(to run the tools on)is a key central feature of the envisioned Internet of FAIR Data and Services(IFDS).Both the European Commission in its Declaration on the European Open Science Cloud,the G7,and the USA data commons have identified the need to ensure a solid and sustainable infrastructure for Open Science.Here we first define the term FAIR science as opposed to Open Science.In FAIR science,data and the associated tools are all Findable,Accessible under well defined conditions,Interoperable and Reusable,but not necessarily“open”;without restrictions and certainly not always“gratis”.The ambiguous term“open”has already caused considerable confusion and also opt-out reactions from researchers and other data-intensive professionals who cannot make their data open for very good reasons,such as patient privacy or national security.Although Open Science is a definition for a way of working rather than explicitly requesting for all data to be available in full Open Access, the connotation of openness of the data involved in Open Science is very strong. In FAIR science, data and the associated services to run all processes in the data stewardship cycle from design of experiment to capture to curation, processing, linking and analytics all have minimally FAIR metadata, which specify the conditions under which the actual underlying research objects are reusable, first for machines and then also for humans. This effectively means that-properly conducted- Open Science is part of FAIR science. However, FAIR science can also be done with partly closed, sensitive and proprietary data. As has been emphasized before, FAIR is not identical to “open”. In FAIR/Open Science, data should be as open as possible and as closed as necessary. Where data are generated using public funding, the default will usually be that for the FAIR data resulting from the study the accessibility will be as high as possible, and that more restrictive access and licensing policies on these data will have to be explicitly justified and described. In all cases, however, even if the reuse is restricted, data and related services should be findable for their major uses, machines, which will make them also much better findable for human users. With a tendency to make good data stewardship the norm, a very significant new market for distributed data analytics and learning is opening and a plethora of tools and reusable data objects are being developed and released. These all need FAIR metadata to be routed to each other and to be effective.
文摘The FAIR principles,an acronym for Findable,Accessible,Interoperable and Reusable,are recognised worldwide as key elements for good practice in all data management processes.To understand how the Brazilian scientific community is adhering to these principles,this article reports Brazilian adherence to the GO FAIR initiative through the creation of the GO FAIR Brazil Office and the manner in which they create their implementation networks.To contextualise this understanding,we provide a brief presentation of open data policies in Brazilian research and government,and finally,we describe a model that has been adopted for the GO FAIR Brazil implementation networks.The Brazilian Institute of Information in Science and Technology is responsible for the GO FAIR Brazil Office,which operates in all fields of knowledge and supports thematic implementation networks.Today,GO FAIR Brazil-Health is the first active implementation network in operation,which works in all health domains,serving as a model for other fields like agriculture,nuclear energy,and digital humanities,which are in the process of adherence negotiation.This report demonstrates the strong interest and effort from the Brazilian scientific communities in implementing the FAIR principles in their research data management practices.
文摘The FAIR principles have been accepted globally as guidelines for improving data-driven science and data management practices,yet the incentives for researchers to change their practices are presently weak.In addition,data-driven science has been slow to embrace workflow technology despite clear evidence of recurring practices.To overcome these challenges,the Canonical Workflow Frameworks for Research(CWFR)initiative suggests a large-scale introduction of self-documenting workflow scripts to automate recurring processes or fragments thereof.This standardised approach,with FAIR Digital Objects as anchors,will be a significant milestone in the transition to FAIR data without adding additional load onto the researchers who stand to benefit most from it.This paper describes the CWFR approach and the activities of the CWFR initiative over the course of the last year or so,highlights several projects that hold promise for the CWFR approaches,including Galaxy,Jupyter Notebook,and RO Crate,and concludes with an assessment of the state of the field and the challenges ahead.
文摘Metadata,data about other digital objects,play an important role in FAIR with a direct relation to all FAIR principles.In this paper we present and discuss the FAIR Data Point(FDP),a software architecture aiming to define a common approach to publish semantically-rich and machine-actionable metadata according to the FAIR principles.We present the core components and features of the FDP,its approach to metadata provision,the criteria to evaluate whether an application adheres to the FDP specifications and the service to register,index and allow users to search for metadata content of available FDPs.
基金supported by the Fundacao para a Ciencia e a Tecnologia through the LASIGE Research DB/00408/2020,UIDP/00408/2020supported by the Federal Ministry of Education and Research of Germany(BMBF)un no.16Dll128("Deutsches Internet-Institut").
文摘The open science movement has gained significant momentum within the last few years.This comes along with the need to store and share research artefacts,such as publications and research data.For this purpose,research repositories need to be established.A variety of solutions exist for implementing such repositories,covering diverse features,ranging from custom depositing workflows to social media-like functions.In this article,we introduce the FAIREST principles,a framework inspired by the well-known FAIR principles,but designed to provide a set of metrics for assessing and selecting solutions for creating digital repositories for research artefacts.The goal is to support decision makers in choosing such a solution when planning for a repository,especially at an institutional level.The metrics included are therefore based on two pillars:(1)an analysis of established features and functionalities,drawn from existing dedicated,general purpose and commonly used solutions,and(2)a literature review on general requirements for digital repositories for research artefacts and related systems.We further describe an assessment of 11 widespread solutions,with the goal to provide an overview of the current landscape of research data repository solutions,identifying gaps and research challenges to be addressed.
基金the Russian Foundation for Basic Research,grants 19-07-01198,18-29-22096.
文摘The investigation proposes the application of an ontological semantic approach to describing workflow control patterns,research workflow step patterns,and the meaning of the workflows in terms of domain knowledge.The approach can provide wide opportunities for semantic refinement,reuse,and composition of workflows.Automatic reasoning allows verifying those compositions and implementations and provides machine-actionable workflow manipulation and problem-solving using workflows.The described approach can take into account the implementation of workflows in different workflow management systems,the organization of workflows collections in data infrastructures and the search for them,the semantic approach to the selection of workflows and resources in the research domain,the creation of research step patterns and their implementation reusing fragments of existing workflows,the possibility of automation of problemsolving based on the reuse of workflows.The application of the approach to CWFR conceptions is proposed.