We introduce an analytical framework for analyzing tweets to(1)identify and categorize fine-grained details about a disaster such as affected individuals,damaged infrastructure and disrupted services;(2)distinguish im...We introduce an analytical framework for analyzing tweets to(1)identify and categorize fine-grained details about a disaster such as affected individuals,damaged infrastructure and disrupted services;(2)distinguish impact areas and time periods,and relative prominence of each category of disaster-related information across space and time.We first identify disaster-related tweets by generating a human-labeled training dataset and experimenting a series of deep learning and machine learning methods for a binary classification of disasterrelatedness.We employ LSTM(Long Short-Term Memory)networks for the classification task because LSTM networks outperform other methods by considering the whole text structure using long-term semantic word and feature dependencies.Second,we employ an unsupervised multi-label classification of tweets using Latent Dirichlet Allocation(LDA),and identify latent categories of tweets such as affected individuals and disrupted services.Third,we employ spatiallyadaptive kernel smoothing and density-based spatial clustering to identify the relative prominence and impact areas for each information category,respectively.Using Hurricane Irma as a case study,we analyze over 500 million keyword-based and geo-located collection of tweets before,during and after the disaster.Our results highlight potential areas with high density of affected individuals and infrastructure damage throughout the temporal progression of the disaster.展开更多
文摘We introduce an analytical framework for analyzing tweets to(1)identify and categorize fine-grained details about a disaster such as affected individuals,damaged infrastructure and disrupted services;(2)distinguish impact areas and time periods,and relative prominence of each category of disaster-related information across space and time.We first identify disaster-related tweets by generating a human-labeled training dataset and experimenting a series of deep learning and machine learning methods for a binary classification of disasterrelatedness.We employ LSTM(Long Short-Term Memory)networks for the classification task because LSTM networks outperform other methods by considering the whole text structure using long-term semantic word and feature dependencies.Second,we employ an unsupervised multi-label classification of tweets using Latent Dirichlet Allocation(LDA),and identify latent categories of tweets such as affected individuals and disrupted services.Third,we employ spatiallyadaptive kernel smoothing and density-based spatial clustering to identify the relative prominence and impact areas for each information category,respectively.Using Hurricane Irma as a case study,we analyze over 500 million keyword-based and geo-located collection of tweets before,during and after the disaster.Our results highlight potential areas with high density of affected individuals and infrastructure damage throughout the temporal progression of the disaster.