摘要
提出了一个灾害微博的实时处理框架,框架基于Spark流计算平台,利用从灾害历史微博数据中提取的信息对获取到的灾害微博进行实时分类统计。以2017年九寨沟地震和林芝地震为例,基于爬虫、文本预处理、文本分类模型等方法实现了灾害微博实时处理原型系统,并通过实验验证了原型系统的可用性。
This paper introduces a real-time processing framework for disaster related microblog. This framework conducts real-time classification and aggregation on disaster related microblog data using information extracted from history microblog data.It consists of three modules: data collecting module, history data processing module and real-time data processing module, which is based on a stream computing platform.To validate this framework,this paper develops a prototype system using several technologies such as web crawling, text pre-processing and text classification models. Finally, this paper takes 2017 Jiuzhaigou earthquake and Linzhi earthquake as cases to test this prototype system.
作者
郑嵘
张晨晓
乐鹏
梁哲恒
ZHENG Rong;ZHANG Chenxiao;YUE Peng;LIANG Zheheng(State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University,Wuhan 430079,China;School of Remote Sensing and Information Engineering,Wuhan University,Wuhan 430079,China;Guangdong South Digital Technology Co.,Ltd.,Guangzhou 510665,China)
出处
《测绘地理信息》
2020年第5期133-137,共5页
Journal of Geomatics
基金
国家重点研发计划项目(2017YFB0504103)
国家自然科学基金(41722109)
湖北省自然科学基金(2018CFA053)。
关键词
灾害应急响应
社交媒体
实时处理
文本分类
流计算平台
disaster emergencies
social media
real-time processing
text classification
stream computing platform