摘要
对于传统的爬虫技术,爬取轨迹数据时存在较低的覆盖率和效率,文中基于云计算技术架构构建了一套高效的分布式多源异构时空数据爬虫系统。由于系统以分秒级别获取基于timestamp的轨迹数据,而存储、计算等无法支持巨大的数据量,文中在TDTR算法的基础上,提出了一种基于T-Map(MCTD)的距离计算公式计算轨迹间距离的轨迹压缩算法(STCA),减少了需要的存储空间。
For the traditional crawler technology,there is a low coverage and efficiency when crawling the trajectory data.An efficient multi-source heterogeneous and spatiotemporal data crawler system is built based on Ali’s cloud computing technology architecture.Since the timestamp-based trajectory data is obtained in the system at the sub-second level,the huge data volume cannot be supported by storage,calculation,etc.Based on the TD-TR algorithm,a method is proposed based on the T-Map measurement formula to calculate the MRTD distance between the trajectories.The trajectory compression algorithm educes the required storage space.
作者
李顺
张圣华
朱美正
高龙
LI Shun;ZHANG Sheng-hua;ZHU Mei-zheng;GAO Long(0x09North China Institute of Computer Technology,Beijing 100083,China;0x09Chinese people’s Liberation Army Air Force Command College,Beijing 100083,China)
出处
《信息技术》
2020年第1期75-78,84,共5页
Information Technology