摘要
针对从海量新闻数据中难以提取热点事件发展趋势的问题,提出了一种基于事件的地区局势分析方法,并以香港事件为例进行建模分析。首先,利用Solr搜索引擎从海量新闻文本数据中快速高效搜索新闻数据;接着,利用自然语言处理技术对新闻进行分词,通过构建事件严重程度词典和政府态度词典,实现了单篇新闻评分;然后,考虑事件累加效应和新闻衰减特性,建立衡量事件严重程度和政府态度的计算模型;最后,利用GDELT(全球事件、语言和语气数据库)数据集验证了模型的准确性。
Aimed at the problem of the difficulty in extracting the development trend of the hot events from the massive news data,a district situation analysis method based on events is proposed,and it is modeled and is analyzed with the example of Hong Kong events.Firstly,used with the Solr search engine,the news data is quickly and efficiently searched from the massive news text data.Secondly,the natural language processing technology is used for news segmentation.By constructing the event severity dictionary and the government attitude dictionary,the scoring of a single news article is realized.Then,considering the cumulative effect of events and the attenuation characteristics of news,a calculation model is established to measure the event severity and the government attitude.Finally,the GDELT(global database of events,language,and tone)event data set is used to verify the accuracy of the model.
作者
李作康
王妍妍
高菁
LI Zuokang;WANG Yanyan;GAO Jing(The 28th Research Institute of China Electronics Technology Group Corporation,Nanjing 210007,China)
出处
《指挥信息系统与技术》
2021年第1期55-59,64,共6页
Command Information System and Technology
基金
国家重点研发计划(2018YFC0806900)资助项目。
关键词
文本处理
词典
香港事件
事件模型
text processing
dictionary
Hong Kong event
event model