Purpose:To develop a set of metrics and identify criteria for assessing the functionality of LOD KOS products while providing common guiding principles that can be used by LOD KOS producers and users to maximize the f...Purpose:To develop a set of metrics and identify criteria for assessing the functionality of LOD KOS products while providing common guiding principles that can be used by LOD KOS producers and users to maximize the functions and usages of LOD KOS products.Design/methodology/approach:Data collection and analysis were conducted at three time periods in 2015–16,2017 and 2019.The sample data used in the comprehensive data analysis comprises all datasets tagged as types of KOS in the Datahub and extracted through their respective SPARQL endpoints.A comparative study of the LOD KOS collected from terminology services Linked Open Vocabularies(LOV)and BioPortal was also performed.Findings:The study proposes a set of Functional,Impactful and Transformable(FIT)metrics for LOD KOS as value vocabularies.The FAIR principles,with additional recommendations,are presented for LOD KOS as open data.Research limitations:The metrics need to be further tested and aligned with the best practices and international standards of both open data and various types of KOS.Practical implications:Assessment performed with FAIR and FIT metrics support the creation and delivery of user-friendly,discoverable and interoperable LOD KOS datasets which can be used for innovative applications,act as a knowledge base,become a foundation of semantic analysis and entity extractions and enhance research in science and the humanities.Originality/value:Our research provides best practice guidelines for LOD KOS as value vocabularies.展开更多
Standards to describe soil properties are well established,with many ISO specifications and a few international thesauri available for specific applications.Besides,in recent years,the European directive on "Infr...Standards to describe soil properties are well established,with many ISO specifications and a few international thesauri available for specific applications.Besides,in recent years,the European directive on "Infrastructure for Spatial Information in the European Community(INSPIRE)"has brought together most of the existing standards into a well defined model.However,the adoption of these standards so far has not reached the level of semantic interoperability,defined in the paper,which would facilitate the building of data services that reuse and combine data from different sources.This paper reviews standards for describing soil data and reports on the work done within the EC funded agINFRA project to apply Linked Data technologies to existing standards and data in order to improve the interoperability of soil datasets.The main result of this work is twofold.First,an RDF vocabulary for soil concepts based on the UML INSPIRE model was published.Second,a KOS(Knowledge Organization System)for soil data was published and mapped to existing relevant KOS,based on the analysis of the SISI database of the CREA of Italy.This work also has a methodological value,in that it proposes and applies a methodology to standardize metadata used in local scientific databases,a very common situation in the scientific domain.Finally,this work aims at contributing towards a wider adoption of the INSPIRE directive,by providing an RDF version of it.展开更多
The rapid increase in the publication of knowledge bases as linked open data (LOD) warrants serious consideration from all concerned, as this phenomenon will potentially scale exponentially. This paper will briefly ...The rapid increase in the publication of knowledge bases as linked open data (LOD) warrants serious consideration from all concerned, as this phenomenon will potentially scale exponentially. This paper will briefly describe the evolution of the LOD, the emerging world-wide semantic web (WWSW), and explore the scalability and performance features Of the service oriented architecture that forms the foundation of the semantic technology platform developed at MIMOS Bhd., for addressing the challenges posed by the intelligent future internet. This paper" concludes with a review of the current status of the agriculture linked open data.展开更多
Heterogeneous data,different definitions and incompatible models are a huge problem in many domains,with no exception for the field of energy systems analysis.Hence,it is hard to re-use results,compare model results o...Heterogeneous data,different definitions and incompatible models are a huge problem in many domains,with no exception for the field of energy systems analysis.Hence,it is hard to re-use results,compare model results or couple models at all.Ontologies provide a precisely defined vocabulary to build a common and shared conceptu-alisation of the energy domain.Here,we present the Open Energy Ontology(OEO)developed for the domain of energy systems analysis.Using the OEO provides several benefits for the community.First,it enables consistent annotation of large amounts of data from various research projects.One example is the Open Energy Platform(OEP).Adding such annotations makes data semantically searchable,exchangeable,re-usable and interoperable.Second,computational model coupling becomes much easier.The advantages of using an ontology such as the OEO are demonstrated with three use cases:data representation,data annotation and interface homogenisation.We also describe how the ontology can be used for linked open data(LOD).展开更多
The Semantic Web seems finally close to maintaining its promise about a real world-wide graph of interconnected resources. The SPARQL query language and protocols and the Linked Open Data initiative have laid the way ...The Semantic Web seems finally close to maintaining its promise about a real world-wide graph of interconnected resources. The SPARQL query language and protocols and the Linked Open Data initiative have laid the way for endless data endpoints sparse around the globe. However, for the Semantic Web to really happen, it does not suffice to get billions of triples out there: these must be shareable, interlinked and conform to widely accepted vocabularies. While more and more data are converted from already available large knowledge repositories of companies and organizations, the question whether these should be carefully converted to semantically consistent ontology vocabularies or find other shallow representations for their content naturally arises. The danger is to come up with massive amounts of useless data, a boomerang which could result to be contradictory for the success of the web of data. In this paper, I provide some insights on common problems which may arise when porting huge amount of existing data or conceptual schemes (very common in the agriculture domain) to resource description framwork (RDF), and will address different modeling choices, by discussing in particular the relationship between the two main modeling vocabularies offered by W3C: OWL and SKOS.展开更多
Linked data is a decentralized space of interlinked Resource Description Framework(RDF) graphs that are published,accessed,and manipulated by a multitude of Web agents.Here,we present a multi-agent framework for minin...Linked data is a decentralized space of interlinked Resource Description Framework(RDF) graphs that are published,accessed,and manipulated by a multitude of Web agents.Here,we present a multi-agent framework for mining hypothetical semantic relations from linked data,in which the discovery,management,and validation of relations can be carried out independently by different agents.These agents collaborate in relation mining by publishing and exchanging inter-dependent knowledge elements,e.g.,hypotheses,evidence,and proofs,giving rise to an evidentiary network that connects and ranks diverse knowledge elements.Simulation results show that the framework is scalable in a multi-agent environment.Real-world applications show that the framework is suitable for interdisciplinary and collaborative relation discovery tasks in social domains.展开更多
The data in electronic medical records(EMR)are complex in structure.They are independent,yet related to each other.In order to improve information access through the use of EMR,annotating work on these data is necessa...The data in electronic medical records(EMR)are complex in structure.They are independent,yet related to each other.In order to improve information access through the use of EMR,annotating work on these data is necessary.The annotation on metadata,the resource data which contain a meta-model of the database,is the basis of the annotating work if a semi-automated or an automated annotating approach which aims at making the database more accessible is expected.In this study,a method has been proposed to transform the terms which cannot be matched directly by changing them literally but maintaining their semantics,and then annotating them indirectly.After the transforming work,a refinement method which is reducible to phrase sense disambiguation(PSD)is employed to ensure accuracy.A pilot study on a hospital database has been conducted to test the accuracy and effectiveness of the proposed method.展开更多
Microblog is a social platform with huge user community and mass data. We propose a semantic recommendation mechanism based on sentiment analysis for microblog. Firstly, the keywords and sensibility words in this mech...Microblog is a social platform with huge user community and mass data. We propose a semantic recommendation mechanism based on sentiment analysis for microblog. Firstly, the keywords and sensibility words in this mechanism are extracted by natural language processing including segmentation, lexical analysis and strategy selection. Then, we query the background knowledge base based on linked open data (LOD) with the basic information of users. The experiment result shows that the accuracy of recommendation is within the range of 70% -89% with sentiment analysis and semantic query. Compared with traditional recommendation method, this method can satisfy users' requirement greatly.展开更多
In the absence of a central naming authority on the Semantic Web,it is common for different data sets to refer to the same thing by different names.Whenever multiple names are used to denote the same thing,owl:sameAs ...In the absence of a central naming authority on the Semantic Web,it is common for different data sets to refer to the same thing by different names.Whenever multiple names are used to denote the same thing,owl:sameAs statements are needed in order to link the data and foster reuse.Studies that date back as far as 2009,observed that the owl:sameAs property is sometimes used incorrectly.In our previous work,we presented an identity graph containing over 500 million explicit and 35 billion implied owl:sameAs statements,and presented a scalable approach for automatically calculating an error degree for each identity statement.In this paper,we generate subgraphs of the overall identity graph that correspond to certain error degrees.We show that even though the Semantic Web contains many erroneous owl:sameAs statements,it is still possible to use Semantic Web data while at the same time minimising the adverse effects of misusing owl:sameAs.展开更多
基金College of Communication and Information(CCI)Research and Creative Activity Fund,Kent State University
文摘Purpose:To develop a set of metrics and identify criteria for assessing the functionality of LOD KOS products while providing common guiding principles that can be used by LOD KOS producers and users to maximize the functions and usages of LOD KOS products.Design/methodology/approach:Data collection and analysis were conducted at three time periods in 2015–16,2017 and 2019.The sample data used in the comprehensive data analysis comprises all datasets tagged as types of KOS in the Datahub and extracted through their respective SPARQL endpoints.A comparative study of the LOD KOS collected from terminology services Linked Open Vocabularies(LOV)and BioPortal was also performed.Findings:The study proposes a set of Functional,Impactful and Transformable(FIT)metrics for LOD KOS as value vocabularies.The FAIR principles,with additional recommendations,are presented for LOD KOS as open data.Research limitations:The metrics need to be further tested and aligned with the best practices and international standards of both open data and various types of KOS.Practical implications:Assessment performed with FAIR and FIT metrics support the creation and delivery of user-friendly,discoverable and interoperable LOD KOS datasets which can be used for innovative applications,act as a knowledge base,become a foundation of semantic analysis and entity extractions and enhance research in science and the humanities.Originality/value:Our research provides best practice guidelines for LOD KOS as value vocabularies.
基金The research leading to these results has received funding from the European Union Seventh Framework Programme(FP7/2007-2013)under grant agreement No.283770.
文摘Standards to describe soil properties are well established,with many ISO specifications and a few international thesauri available for specific applications.Besides,in recent years,the European directive on "Infrastructure for Spatial Information in the European Community(INSPIRE)"has brought together most of the existing standards into a well defined model.However,the adoption of these standards so far has not reached the level of semantic interoperability,defined in the paper,which would facilitate the building of data services that reuse and combine data from different sources.This paper reviews standards for describing soil data and reports on the work done within the EC funded agINFRA project to apply Linked Data technologies to existing standards and data in order to improve the interoperability of soil datasets.The main result of this work is twofold.First,an RDF vocabulary for soil concepts based on the UML INSPIRE model was published.Second,a KOS(Knowledge Organization System)for soil data was published and mapped to existing relevant KOS,based on the analysis of the SISI database of the CREA of Italy.This work also has a methodological value,in that it proposes and applies a methodology to standardize metadata used in local scientific databases,a very common situation in the scientific domain.Finally,this work aims at contributing towards a wider adoption of the INSPIRE directive,by providing an RDF version of it.
文摘The rapid increase in the publication of knowledge bases as linked open data (LOD) warrants serious consideration from all concerned, as this phenomenon will potentially scale exponentially. This paper will briefly describe the evolution of the LOD, the emerging world-wide semantic web (WWSW), and explore the scalability and performance features Of the service oriented architecture that forms the foundation of the semantic technology platform developed at MIMOS Bhd., for addressing the challenges posed by the intelligent future internet. This paper" concludes with a review of the current status of the agriculture linked open data.
基金This work was supported by grants from the Federal Ministry for Economic Affairs and Energy of Germany(BMWi)for the projects SzenarienDB(03ET4057A-D),LOD-GEOSS(03EI1005A-G)and SIROP(03EI1035A-D).
文摘Heterogeneous data,different definitions and incompatible models are a huge problem in many domains,with no exception for the field of energy systems analysis.Hence,it is hard to re-use results,compare model results or couple models at all.Ontologies provide a precisely defined vocabulary to build a common and shared conceptu-alisation of the energy domain.Here,we present the Open Energy Ontology(OEO)developed for the domain of energy systems analysis.Using the OEO provides several benefits for the community.First,it enables consistent annotation of large amounts of data from various research projects.One example is the Open Energy Platform(OEP).Adding such annotations makes data semantically searchable,exchangeable,re-usable and interoperable.Second,computational model coupling becomes much easier.The advantages of using an ontology such as the OEO are demonstrated with three use cases:data representation,data annotation and interface homogenisation.We also describe how the ontology can be used for linked open data(LOD).
文摘The Semantic Web seems finally close to maintaining its promise about a real world-wide graph of interconnected resources. The SPARQL query language and protocols and the Linked Open Data initiative have laid the way for endless data endpoints sparse around the globe. However, for the Semantic Web to really happen, it does not suffice to get billions of triples out there: these must be shareable, interlinked and conform to widely accepted vocabularies. While more and more data are converted from already available large knowledge repositories of companies and organizations, the question whether these should be carefully converted to semantically consistent ontology vocabularies or find other shallow representations for their content naturally arises. The danger is to come up with massive amounts of useless data, a boomerang which could result to be contradictory for the success of the web of data. In this paper, I provide some insights on common problems which may arise when porting huge amount of existing data or conceptual schemes (very common in the agriculture domain) to resource description framwork (RDF), and will address different modeling choices, by discussing in particular the relationship between the two main modeling vocabularies offered by W3C: OWL and SKOS.
基金supported by the National Natural Science Foundation of China (Nos.61070156 and 61100183)the Natural Science Foundation of Zhejiang Province,China (No.Y1110477)
文摘Linked data is a decentralized space of interlinked Resource Description Framework(RDF) graphs that are published,accessed,and manipulated by a multitude of Web agents.Here,we present a multi-agent framework for mining hypothetical semantic relations from linked data,in which the discovery,management,and validation of relations can be carried out independently by different agents.These agents collaborate in relation mining by publishing and exchanging inter-dependent knowledge elements,e.g.,hypotheses,evidence,and proofs,giving rise to an evidentiary network that connects and ranks diverse knowledge elements.Simulation results show that the framework is scalable in a multi-agent environment.Real-world applications show that the framework is suitable for interdisciplinary and collaborative relation discovery tasks in social domains.
文摘The data in electronic medical records(EMR)are complex in structure.They are independent,yet related to each other.In order to improve information access through the use of EMR,annotating work on these data is necessary.The annotation on metadata,the resource data which contain a meta-model of the database,is the basis of the annotating work if a semi-automated or an automated annotating approach which aims at making the database more accessible is expected.In this study,a method has been proposed to transform the terms which cannot be matched directly by changing them literally but maintaining their semantics,and then annotating them indirectly.After the transforming work,a refinement method which is reducible to phrase sense disambiguation(PSD)is employed to ensure accuracy.A pilot study on a hospital database has been conducted to test the accuracy and effectiveness of the proposed method.
基金Supported by the National Natural Science Foundation of China(60803160 and 61272110)the Key Projects of National Social Science Foundation of China(11&ZD189)+4 种基金the Natural Science Foundation of Hubei Province(2013CFB334)the Natural Science Foundation of Educational Agency of Hubei Province(Q20101110)the State Key Lab of Software Engineering Open Foundation of Wuhan University(SKLSE2012-09-07)the Teaching Research Project of Hubei Province(2011s005)the Wuhan Key Technology Support Program(2013010602010216)
文摘Microblog is a social platform with huge user community and mass data. We propose a semantic recommendation mechanism based on sentiment analysis for microblog. Firstly, the keywords and sensibility words in this mechanism are extracted by natural language processing including segmentation, lexical analysis and strategy selection. Then, we query the background knowledge base based on linked open data (LOD) with the basic information of users. The experiment result shows that the accuracy of recommendation is within the range of 70% -89% with sentiment analysis and semantic query. Compared with traditional recommendation method, this method can satisfy users' requirement greatly.
文摘In the absence of a central naming authority on the Semantic Web,it is common for different data sets to refer to the same thing by different names.Whenever multiple names are used to denote the same thing,owl:sameAs statements are needed in order to link the data and foster reuse.Studies that date back as far as 2009,observed that the owl:sameAs property is sometimes used incorrectly.In our previous work,we presented an identity graph containing over 500 million explicit and 35 billion implied owl:sameAs statements,and presented a scalable approach for automatically calculating an error degree for each identity statement.In this paper,we generate subgraphs of the overall identity graph that correspond to certain error degrees.We show that even though the Semantic Web contains many erroneous owl:sameAs statements,it is still possible to use Semantic Web data while at the same time minimising the adverse effects of misusing owl:sameAs.