Automated metadata annotation is only as good as training dataset,or rules that are available for the domain.It's important to learn what type of data content a pre-trained machine learning algorithm has been trai...Automated metadata annotation is only as good as training dataset,or rules that are available for the domain.It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases.Consider what type of content is readily available to train an algorithm-what's popular and what's available.However,scholarly and historical content is often not available in consumable,homogenized,and interoperable formats at the large volume that is required for machine learning.There are exceptions such as science and medicine,where large,well documented collections are available.This paper presents the current state of automated metadata annotation in cultural heritage and research data,discusses challenges identified from use cases,and proposes solutions.展开更多
The data in electronic medical records(EMR)are complex in structure.They are independent,yet related to each other.In order to improve information access through the use of EMR,annotating work on these data is necessa...The data in electronic medical records(EMR)are complex in structure.They are independent,yet related to each other.In order to improve information access through the use of EMR,annotating work on these data is necessary.The annotation on metadata,the resource data which contain a meta-model of the database,is the basis of the annotating work if a semi-automated or an automated annotating approach which aims at making the database more accessible is expected.In this study,a method has been proposed to transform the terms which cannot be matched directly by changing them literally but maintaining their semantics,and then annotating them indirectly.After the transforming work,a refinement method which is reducible to phrase sense disambiguation(PSD)is employed to ensure accuracy.A pilot study on a hospital database has been conducted to test the accuracy and effectiveness of the proposed method.展开更多
Heterogeneous data,different definitions and incompatible models are a huge problem in many domains,with no exception for the field of energy systems analysis.Hence,it is hard to re-use results,compare model results o...Heterogeneous data,different definitions and incompatible models are a huge problem in many domains,with no exception for the field of energy systems analysis.Hence,it is hard to re-use results,compare model results or couple models at all.Ontologies provide a precisely defined vocabulary to build a common and shared conceptu-alisation of the energy domain.Here,we present the Open Energy Ontology(OEO)developed for the domain of energy systems analysis.Using the OEO provides several benefits for the community.First,it enables consistent annotation of large amounts of data from various research projects.One example is the Open Energy Platform(OEP).Adding such annotations makes data semantically searchable,exchangeable,re-usable and interoperable.Second,computational model coupling becomes much easier.The advantages of using an ontology such as the OEO are demonstrated with three use cases:data representation,data annotation and interface homogenisation.We also describe how the ontology can be used for linked open data(LOD).展开更多
文摘Automated metadata annotation is only as good as training dataset,or rules that are available for the domain.It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases.Consider what type of content is readily available to train an algorithm-what's popular and what's available.However,scholarly and historical content is often not available in consumable,homogenized,and interoperable formats at the large volume that is required for machine learning.There are exceptions such as science and medicine,where large,well documented collections are available.This paper presents the current state of automated metadata annotation in cultural heritage and research data,discusses challenges identified from use cases,and proposes solutions.
文摘The data in electronic medical records(EMR)are complex in structure.They are independent,yet related to each other.In order to improve information access through the use of EMR,annotating work on these data is necessary.The annotation on metadata,the resource data which contain a meta-model of the database,is the basis of the annotating work if a semi-automated or an automated annotating approach which aims at making the database more accessible is expected.In this study,a method has been proposed to transform the terms which cannot be matched directly by changing them literally but maintaining their semantics,and then annotating them indirectly.After the transforming work,a refinement method which is reducible to phrase sense disambiguation(PSD)is employed to ensure accuracy.A pilot study on a hospital database has been conducted to test the accuracy and effectiveness of the proposed method.
基金This work was supported by grants from the Federal Ministry for Economic Affairs and Energy of Germany(BMWi)for the projects SzenarienDB(03ET4057A-D),LOD-GEOSS(03EI1005A-G)and SIROP(03EI1035A-D).
文摘Heterogeneous data,different definitions and incompatible models are a huge problem in many domains,with no exception for the field of energy systems analysis.Hence,it is hard to re-use results,compare model results or couple models at all.Ontologies provide a precisely defined vocabulary to build a common and shared conceptu-alisation of the energy domain.Here,we present the Open Energy Ontology(OEO)developed for the domain of energy systems analysis.Using the OEO provides several benefits for the community.First,it enables consistent annotation of large amounts of data from various research projects.One example is the Open Energy Platform(OEP).Adding such annotations makes data semantically searchable,exchangeable,re-usable and interoperable.Second,computational model coupling becomes much easier.The advantages of using an ontology such as the OEO are demonstrated with three use cases:data representation,data annotation and interface homogenisation.We also describe how the ontology can be used for linked open data(LOD).