Semantically aligning the heterogeneous geospatial datasets(GDs)produced by different organizations demands efficient similarity matching methods.However,the strategies employed to align the schema(concept and propert...Semantically aligning the heterogeneous geospatial datasets(GDs)produced by different organizations demands efficient similarity matching methods.However,the strategies employed to align the schema(concept and property)and instances are usually not reusable,and the effects of unbalanced information tend to be neglected in GD alignment.To solve this problem,a holistic approach is presented in this paper to integrally align the geospatial entities(concepts,properties and instances)simultaneously.Spatial,lexical,structural and extensional similarity metrics are designed and automatically aggregated by means of approval voting.The presented approach is validated with real geographical semantic webs,Geonames and OpenStreetMap.Compared with the well-known extensional-based aligning system,the presented approach not only considers more information involved in GD alignment,but also avoids the artificial parameter setting in metric aggregation.It reduces the dependency on specific information,and makes the alignment more robust under the unbalanced distribution of various information.展开更多
As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge d...As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge determines the quality of GeoKG,thus drawing considerable attention in the related domains.Mass unstructured geographic knowledge scattered in web texts has been regarded as a potential source for enriching the triplets in GeoKGs.The crux of triplet extraction from web texts lies in the detection of key phrases indicating the correct geo-relations between geo-entities.However,the current methods for key-phrase detection are ineffective because the sparseness of the terms in the web texts describing geo-relations results in an insufficient training corpus.In this study,an unsupervised context-enhanced method is proposed to detect geo-relation key phrases from web texts for extracting triplets.External semantic knowledge is introduced to relieve the influence of the sparseness of the georelation description terms in web texts.Specifically,the contexts of geo-entities are fused with category semantic knowledge and word semantic knowledge.Subsequently,an enhanced corpus is generated using frequency-based statistics.Finally,the geo-relation key phrases are detected from the enhanced contexts using the statistical lexical features from the enhanced corpus.Experiments are conducted with real web texts.In comparison with the well-known frequency-based methods,the proposed method improves the precision of detecting the key phrases of the geo-relation description by approximately 20%.Moreover,compared with the well-defined geo-relation properties in DBpedia,the proposed method provides quintuple key-phrases for indicating the geo-relations between geo-entities,which facilitate the generation of new triplets from web texts.展开更多
基金the National Natural Science Foundation of China[grant number 41631177]the Chinese Academy of Sciences Key Project[grant number ZDRW-ZS-2016-6-3].
文摘Semantically aligning the heterogeneous geospatial datasets(GDs)produced by different organizations demands efficient similarity matching methods.However,the strategies employed to align the schema(concept and property)and instances are usually not reusable,and the effects of unbalanced information tend to be neglected in GD alignment.To solve this problem,a holistic approach is presented in this paper to integrally align the geospatial entities(concepts,properties and instances)simultaneously.Spatial,lexical,structural and extensional similarity metrics are designed and automatically aggregated by means of approval voting.The presented approach is validated with real geographical semantic webs,Geonames and OpenStreetMap.Compared with the well-known extensional-based aligning system,the presented approach not only considers more information involved in GD alignment,but also avoids the artificial parameter setting in metric aggregation.It reduces the dependency on specific information,and makes the alignment more robust under the unbalanced distribution of various information.
基金This research was supported by the National Natural Science Foundation of China[41631177,41801320].
文摘As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge determines the quality of GeoKG,thus drawing considerable attention in the related domains.Mass unstructured geographic knowledge scattered in web texts has been regarded as a potential source for enriching the triplets in GeoKGs.The crux of triplet extraction from web texts lies in the detection of key phrases indicating the correct geo-relations between geo-entities.However,the current methods for key-phrase detection are ineffective because the sparseness of the terms in the web texts describing geo-relations results in an insufficient training corpus.In this study,an unsupervised context-enhanced method is proposed to detect geo-relation key phrases from web texts for extracting triplets.External semantic knowledge is introduced to relieve the influence of the sparseness of the georelation description terms in web texts.Specifically,the contexts of geo-entities are fused with category semantic knowledge and word semantic knowledge.Subsequently,an enhanced corpus is generated using frequency-based statistics.Finally,the geo-relation key phrases are detected from the enhanced contexts using the statistical lexical features from the enhanced corpus.Experiments are conducted with real web texts.In comparison with the well-known frequency-based methods,the proposed method improves the precision of detecting the key phrases of the geo-relation description by approximately 20%.Moreover,compared with the well-defined geo-relation properties in DBpedia,the proposed method provides quintuple key-phrases for indicating the geo-relations between geo-entities,which facilitate the generation of new triplets from web texts.