摘要
随着Web2.0时代发展的成熟,在以微博为代表的新社交网络平台上产生了大量含有空间位置信息和时间标识的数据,即位置数据(location data)。位置数据是大数据的一个重要组成部分,现已作为一种战略性资源被广泛应用于社会生活的多个领域,而位置数据的获取是位置数据挖掘和应用的基础。深刻分析了基于API、基于网络爬虫和基于网络数据流3种目前常用的微博数据获取方法的特点,在此基础上提出了一种基于多策略的微博位置数据获取方法,详细阐述了该方法的基本原理、基本流程和主要特点。最后通过获取新浪微博的位置数据进行实验验证,结果证实,该方法可以实现全面、高效地获取微博位置数据。
With the arrival of the Web2.0 Era, there has been a large number of data with spatial information and time identification on the social networking platforms presented by microblog, which is called location data. As an important part of big data and a kind of strategic resource, location data has been widely used in many fields of so- cial life, and the acquisition of location data is the basis for its mining and application. After giving a deep analysis of the characteristics of the three methods currently used to acquire microblog data, i.e. methods based on API, web crawler and network data stream, this paper proposed a method of microblog location data acquisition based on multi-strategy, and described its theory, process and features. Finally this paper performed an experiment of acqui- ring Sinamicroblog location data using this method, and the results confirmed that it can achieve the comprehensive and efficient acquisition of microblog location data.
出处
《测绘科学技术学报》
CSCD
北大核心
2016年第2期201-207,共7页
Journal of Geomatics Science and Technology
基金
国家自然科学基金项目(41271450)
国家自然科学基金青年科学基金项目(41401467)
国家科技支撑计划项目(2012BAK12B02)