The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting ...The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist.The objective of this paper is to present a publically available Geospatial Search Engine(GSE)that utilizes a web crawler built on top of the Google search engine in order to search the web for geospatial data.The crawler seeding mechanism combines search terms entered by users with predefined keywords that identify geospatial data services.A procedure runs daily to update map server layers and metadata,and to eliminate servers that go offline.The GSE supports Web Map Services,ArcGIS services,and websites that have geospatial data for download.We applied the GSE to search for all available geospatial services under these formats and provide search results including the spatial distribution of all obtained services.While enhancements to our GSE and to web crawler technology in general lie ahead,our work represents an important step toward realizing the potential of a publically accessible tool for discovering the global availability of geospatial data.展开更多
Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a...Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a challenging multi-label text classification task due to limited training samples,mixed vocabularies,variable length and content arbitrariness of text fields.In this paper,we propose a novel multi-label text classification method,Text GCN-SW-KNN,based on geographic semantics and collaborative training to improve classifica-tion accuracy.The semi-supervised collaborative training adopts two base models,i.e.a modified Text Graph Convolutional Network(Text GCN)by utilizing Semantic Web,named Text GCN-SW,and widely-used Multi-Label K-Nearest Neighbor(ML-KNN).Text GCN-SW is improved from Text GCN by adjusting the adjacency matrix of the heterogeneous word document graph with the shortest semantic distances between themes and words in metadata text.The distances are calculated with the Semantic Web of Earth and Environmental Terminology(SWEET)and WordNet dictionaries.Experiments on both the WMS and layer metadata show that the proposed methods can achieve higher F1-score and accuracy than state-of-the-art baselines,and demonstrate better stability in repeating experiments and robustness to less training data.Text GCN-SW-KNN can be extended to other multi-label text classification scenario for better supporting metadata enhancement and geospatial resource discovery in Earth Science domain.展开更多
文摘The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created.However,challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist.The objective of this paper is to present a publically available Geospatial Search Engine(GSE)that utilizes a web crawler built on top of the Google search engine in order to search the web for geospatial data.The crawler seeding mechanism combines search terms entered by users with predefined keywords that identify geospatial data services.A procedure runs daily to update map server layers and metadata,and to eliminate servers that go offline.The GSE supports Web Map Services,ArcGIS services,and websites that have geospatial data for download.We applied the GSE to search for all available geospatial services under these formats and provide search results including the spatial distribution of all obtained services.While enhancements to our GSE and to web crawler technology in general lie ahead,our work represents an important step toward realizing the potential of a publically accessible tool for discovering the global availability of geospatial data.
基金supported by National Natural Science Foundation of China[No.41971349,No.41930107,No.42090010 and No.41501434]National Key Research and Development Program of China[No.2017YFB0503704 and No.2018YFC0809806].
文摘Without explicit description of map application themes,it is difficult for users to discover desired map resources from massive online Web Map Services(WMS).However,metadata-based map application theme extraction is a challenging multi-label text classification task due to limited training samples,mixed vocabularies,variable length and content arbitrariness of text fields.In this paper,we propose a novel multi-label text classification method,Text GCN-SW-KNN,based on geographic semantics and collaborative training to improve classifica-tion accuracy.The semi-supervised collaborative training adopts two base models,i.e.a modified Text Graph Convolutional Network(Text GCN)by utilizing Semantic Web,named Text GCN-SW,and widely-used Multi-Label K-Nearest Neighbor(ML-KNN).Text GCN-SW is improved from Text GCN by adjusting the adjacency matrix of the heterogeneous word document graph with the shortest semantic distances between themes and words in metadata text.The distances are calculated with the Semantic Web of Earth and Environmental Terminology(SWEET)and WordNet dictionaries.Experiments on both the WMS and layer metadata show that the proposed methods can achieve higher F1-score and accuracy than state-of-the-art baselines,and demonstrate better stability in repeating experiments and robustness to less training data.Text GCN-SW-KNN can be extended to other multi-label text classification scenario for better supporting metadata enhancement and geospatial resource discovery in Earth Science domain.