Social media provide digitally interactional technologies to facilitate information sharing and exchanging individuals.Precisely,in case of disasters,a massive corpus is placed on platforms such as Twitter.Eyewitness ...Social media provide digitally interactional technologies to facilitate information sharing and exchanging individuals.Precisely,in case of disasters,a massive corpus is placed on platforms such as Twitter.Eyewitness accounts can benefit humanitarian organizations and agencies,but identifying the eyewitness Tweets related to the disaster from millions of Tweets is difficult.Different approaches have been developed to address this kind of problem.The recent state-of-the-art system was based on a manually created dictionary and this approach was further refined by introducing linguistic rules.However,these approaches suffer from limitations as they are dataset-dependent and not scalable.In this paper,we proposed a method to identify eyewitnesses from Twitter.To experiment,we utilized 13 features discovered by the pioneer of this domain and can classify the tweets to determine the eyewitness.Considering each feature,a dictionary of words was created with the Word Dictionary Maker algorithm,which is the crucial contribution of this research.This algorithm inputs some terms relevant to a specific feature for its initialization and then creates the words dictionary.Further,keyword matching for each feature in tweets is performed.If a feature exists in a tweet,it is termed as 1;otherwise,0.Similarly,for 13 features,we created a file that reflects features in each tweet.To classify the tweets based on features,Naïve Bayes,Random Forest,and Neural Network were utilized.The approach was implemented on different disasters like earthquakes,floods,hurricanes,and Forest fires.The results were compared with the state-of-the-art linguistic rule-based system with 0.81 F-measure values.At the same time,the proposed approach gained a 0.88 value of F-measure.The results were comparable as the proposed approach is not dataset-dependent.Therefore,it can be used for the identification of eyewitness accounts.展开更多
Contains 210 frequentlyused measure words Includes nominal measure words,verbal measure words,concurrent measure words,etc.Compiled according to the HSK examination outline Features multiple retrieval
基金This research is funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R54)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Social media provide digitally interactional technologies to facilitate information sharing and exchanging individuals.Precisely,in case of disasters,a massive corpus is placed on platforms such as Twitter.Eyewitness accounts can benefit humanitarian organizations and agencies,but identifying the eyewitness Tweets related to the disaster from millions of Tweets is difficult.Different approaches have been developed to address this kind of problem.The recent state-of-the-art system was based on a manually created dictionary and this approach was further refined by introducing linguistic rules.However,these approaches suffer from limitations as they are dataset-dependent and not scalable.In this paper,we proposed a method to identify eyewitnesses from Twitter.To experiment,we utilized 13 features discovered by the pioneer of this domain and can classify the tweets to determine the eyewitness.Considering each feature,a dictionary of words was created with the Word Dictionary Maker algorithm,which is the crucial contribution of this research.This algorithm inputs some terms relevant to a specific feature for its initialization and then creates the words dictionary.Further,keyword matching for each feature in tweets is performed.If a feature exists in a tweet,it is termed as 1;otherwise,0.Similarly,for 13 features,we created a file that reflects features in each tweet.To classify the tweets based on features,Naïve Bayes,Random Forest,and Neural Network were utilized.The approach was implemented on different disasters like earthquakes,floods,hurricanes,and Forest fires.The results were compared with the state-of-the-art linguistic rule-based system with 0.81 F-measure values.At the same time,the proposed approach gained a 0.88 value of F-measure.The results were comparable as the proposed approach is not dataset-dependent.Therefore,it can be used for the identification of eyewitness accounts.
文摘Contains 210 frequentlyused measure words Includes nominal measure words,verbal measure words,concurrent measure words,etc.Compiled according to the HSK examination outline Features multiple retrieval