期刊文献+

基于NLP的兴趣点数据上线系统设计与实现

DESIGN AND IMPLEMENTATION OF POI DATA ONLINE SYSTEM BASED ON NLP
下载PDF
导出
摘要 全面丰富的兴趣点(Point of Interest,POI)数据直接影响着地图App厂商的地理位置服务。针对传统的POI数据采集与上线方式周期长、速度慢的问题,提出一种高效的采集、上线POI数据的方式。将数据上线工作细化为:数据采集,数据格式化,数据判重与存储。在数据采集模块上采用一种负载均衡的分布式网络爬虫采集技术,数据格式化模块用于处理数据采集模块采集出的原始数据格式不统一的问题。数据判重模块将新旧数据的名称进行相似度计算,再结合经纬度计算的距离进行判重。结合Word2Vec与Siamese-LSTM设计判重模型,准确率达93.5%。 The comprehensive and abundant POI(Point of Interest)data directly affects the geographical location services of map App manufacturers.Aiming at the problems of long cycle and slow speed of traditional POI data collection and upload mode,an efficient way of collecting and upload POI data is proposed.The data upload work was divided into data collection,data formatting,data uniqueness and storage.The data collection module adopted a load balanced distributed Web crawler collection technology,and the data formatting module was used to deal with the inconsistency of the original data format collected by the data collection module.The data uniqueness module calculated the similarity between the old and new data names,and then judged the uniqueness by combining the distance calculated by longitude and latitude.Combining Word2Vec with Siamese-LSTM to design the uniqueness model,the accuracy is 93.5%.
作者 张先荣 郑贵俊 Zhang Xianrong;Zheng Guijun(School of Software Engineering,University of Science and Technology of China,Hefei 230051,Anhui,China)
出处 《计算机应用与软件》 北大核心 2020年第12期17-25,共9页 Computer Applications and Software
关键词 数据采集 数据判重 POI数据 Word2Vec Siamese-LSTM 短文本相似度 Data collection Data uniqueness POI data Word2Vec Siamese-LSTM Short text similarity
  • 相关文献

参考文献8

二级参考文献51

共引文献205

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部