摘要
针对海量过车数据检索困难的问题,设计了一款基于Solr的大规模分布式数据检索系统.前端IPC采集的数据经过结构化处理之后发送到后端,数据先缓存在消息队列中,再通过Spark Streaming实时计算框架对缓存的数据进行消费,将数据搬运到数据库HBase中,最后由Solr爬取HBase中的数据,根据用户的配置建立索引文件.查询时,用户通过点击Web界面下发查询条件,系统将查询条件解析为Solr能够识别的查询语句,从索引文件中取出相应的信息,最后从HBase中取出完整的数据,返回到界面显示.测试结果表明,系统工作稳定,可存储海量多种类型数据,索引建立速度为1 000条/s,当数据库中存储一千亿条过车记录时,对此类TB级别数据进行各种条件查询的响应时间均在10s之内.
A distributed data retrieval system is designed based on Solr.The front-end IPC collects monitor data,which sends them to the back-end after its structure processed.The data is cached in the message queue.Then it is carried to HBase by Spark Streaming the real-time calculation framework.Finally,Solr crawls data in HBase and create index file according to the user's requirement of configuration.Users issue the query through clicking the Web interface in querying.Then the system analyzes inquiry condition into inquiry sentences that can be identified by Solr.Next,Solr extract the corresponding information from the index file.Finally,the system extracts the complete data from HBase and return to display in the interface.Measurement results show that the system is stable and can store many types of data.Over1000/s of indexing speed is achieved.The response times of a variety of conditions are less than10seconds,when the database is stored over100billion car records.
作者
程知群
章超
韩高帅
CHENG Zhiqun;ZHANG Chao;HAN GaoShuai(School of Electronic Information,Hangzhou Dianzi University,Hangzhou Zhejiang 310018, China)
出处
《杭州电子科技大学学报(自然科学版)》
2017年第1期11-15,共5页
Journal of Hangzhou Dianzi University:Natural Sciences
关键词
大数据
智能交通
SOLR
索引
big data
intelligent transportation
Solr
index