Extracting precise geographical information from the textual content,referred to as toponym recognition,is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses,e.g.mining loc...Extracting precise geographical information from the textual content,referred to as toponym recognition,is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses,e.g.mining location-based information from social media,news reports,and surveys for various applications.However,the performance of existing toponym recognition methods and tools is deficient in supporting tasks that rely on extracting fine-grained geographic information from texts,e.g.locating people sending help requests with addresses through social media during disasters.The emerging pretrained language models have,revolutionized,natural language_processing and understanding by machines,offering a promising pathway to optimize toponym recognition to underpin practical applications.In this paper,TopoBERT,a uniquely designed toponym recognition module based on a one-dimensional'Convolutional Neural Network(CNN1D)and Bidirectional Encoder Representation from Transformers(BERT),is proposed and fine-tuned.Three datasets are leveraged to tune the hyperparameters and discover the best strategy to train the model.Another seven datasets are used to evaluate the performance.TopoBERT achieves state-of-the-art performance(average f1-score=0.854)compared to the seven baseline models.It is encapsulated into easy-to-use python scripts and can be seamlessly applied to diverse toponym recognition tasks without additional training.展开更多
基金Reducing the Human Impacts of Flash Floods-Development of Microdata and Causal Model to Inform Mitigation and Prepa-redness(Award No.1931301)Geospatial Artificial Intelligence Approaches for Understanding Location Descriptions in Natural Disasters and Their Spatial Biases(Award No.2117771).
文摘Extracting precise geographical information from the textual content,referred to as toponym recognition,is fundamental in geographical information retrieval and crucial in a plethora of spatial analyses,e.g.mining location-based information from social media,news reports,and surveys for various applications.However,the performance of existing toponym recognition methods and tools is deficient in supporting tasks that rely on extracting fine-grained geographic information from texts,e.g.locating people sending help requests with addresses through social media during disasters.The emerging pretrained language models have,revolutionized,natural language_processing and understanding by machines,offering a promising pathway to optimize toponym recognition to underpin practical applications.In this paper,TopoBERT,a uniquely designed toponym recognition module based on a one-dimensional'Convolutional Neural Network(CNN1D)and Bidirectional Encoder Representation from Transformers(BERT),is proposed and fine-tuned.Three datasets are leveraged to tune the hyperparameters and discover the best strategy to train the model.Another seven datasets are used to evaluate the performance.TopoBERT achieves state-of-the-art performance(average f1-score=0.854)compared to the seven baseline models.It is encapsulated into easy-to-use python scripts and can be seamlessly applied to diverse toponym recognition tasks without additional training.