Extracting precise geographical information from the textual content,referred to as toponym recognition,is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses,e.g.mining loc...Extracting precise geographical information from the textual content,referred to as toponym recognition,is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses,e.g.mining location-based information from social media,news reports,and surveys for various applications.However,the performance of existing toponym recognition methods and tools is deficient in supporting tasks that rely on extracting fine-grained geographic information from texts,e.g.locating people sending help requests with addresses through social media during disasters.The emerging pretrained language models have,revolutionized,natural language_processing and understanding by machines,offering a promising pathway to optimize toponym recognition to underpin practical applications.In this paper,TopoBERT,a uniquely designed toponym recognition module based on a one-dimensional'Convolutional Neural Network(CNN1D)and Bidirectional Encoder Representation from Transformers(BERT),is proposed and fine-tuned.Three datasets are leveraged to tune the hyperparameters and discover the best strategy to train the model.Another seven datasets are used to evaluate the performance.TopoBERT achieves state-of-the-art performance(average f1-score=0.854)compared to the seven baseline models.It is encapsulated into easy-to-use python scripts and can be seamlessly applied to diverse toponym recognition tasks without additional training.展开更多
Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this articl...Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this article,we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task.We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics,which has the natural advantage of avoiding the manual tuning of similarity thresholds.Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small,and that carefully tuning the similarity threshold is important for achieving good results.The methods based on supervised machine learning,particularly when considering ensembles of decision trees,can achieve good results on this task,significantly outperforming the individual similarity metrics.展开更多
Geographical names or toponyms in Beijing boast a long history,distinctive characteristics,and rich cultural connotations and humanistic values,thus the toponymic culture protection in Beijing is of great significance...Geographical names or toponyms in Beijing boast a long history,distinctive characteristics,and rich cultural connotations and humanistic values,thus the toponymic culture protection in Beijing is of great significance.Based on the results of the Second National Census of Geographical Names in Beijing,this paper summarizes the characteristics of geographical names in Beijing and their distribution pattern presented in maps,and puts forward suggestions for protecting the toponymic culture.展开更多
基金Reducing the Human Impacts of Flash Floods-Development of Microdata and Causal Model to Inform Mitigation and Prepa-redness(Award No.1931301)Geospatial Artificial Intelligence Approaches for Understanding Location Descriptions in Natural Disasters and Their Spatial Biases(Award No.2117771).
文摘Extracting precise geographical information from the textual content,referred to as toponym recognition,is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses,e.g.mining location-based information from social media,news reports,and surveys for various applications.However,the performance of existing toponym recognition methods and tools is deficient in supporting tasks that rely on extracting fine-grained geographic information from texts,e.g.locating people sending help requests with addresses through social media during disasters.The emerging pretrained language models have,revolutionized,natural language_processing and understanding by machines,offering a promising pathway to optimize toponym recognition to underpin practical applications.In this paper,TopoBERT,a uniquely designed toponym recognition module based on a one-dimensional'Convolutional Neural Network(CNN1D)and Bidirectional Encoder Representation from Transformers(BERT),is proposed and fine-tuned.Three datasets are leveraged to tune the hyperparameters and discover the best strategy to train the model.Another seven datasets are used to evaluate the performance.TopoBERT achieves state-of-the-art performance(average f1-score=0.854)compared to the seven baseline models.It is encapsulated into easy-to-use python scripts and can be seamlessly applied to diverse toponym recognition tasks without additional training.
基金the Trans-Atlantic Platform for the Social Sciences and Humanities,through the Digging into Data project with reference HJ-253525also through the Reassembling the Republic of Letters networking programme(EU COST Action IS1310)+1 种基金The researchers from INESC-ID also had financial support from Fundação para a Ciência e a Tecnologia(FCT),through project grants with references PTDC/EEI-SCR/1743/2014(Saturn)CMUP-ERI/TIC/0046/2014(GoLocal),as well as through the INESC-ID multi-annual funding from the PIDDAC programme(UID/CEC/50021/2013).
文摘Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this article,we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task.We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics,which has the natural advantage of avoiding the manual tuning of similarity thresholds.Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small,and that carefully tuning the similarity threshold is important for achieving good results.The methods based on supervised machine learning,particularly when considering ensembles of decision trees,can achieve good results on this task,significantly outperforming the individual similarity metrics.
文摘Geographical names or toponyms in Beijing boast a long history,distinctive characteristics,and rich cultural connotations and humanistic values,thus the toponymic culture protection in Beijing is of great significance.Based on the results of the Second National Census of Geographical Names in Beijing,this paper summarizes the characteristics of geographical names in Beijing and their distribution pattern presented in maps,and puts forward suggestions for protecting the toponymic culture.