摘要
随着网络数据、生产数据的大幅增长,数据存储和查询面临着严峻的挑战.数据划分技术可将海量数据分布存储在多台机器中,既能解决单机存储容量问题,也能通过划分区间来缩小数据查询范围.为此,研究了海量数据背景下数据划分存储和查询的方法,设计了将海量数据按角度和距离值计算其所属数据区间,并分布存储到该区间对应的机器文件中,从而实现了大数据量的文件以小数据量的文件存储,使得查询数据时可以先通过索引表找到所属的数据区间其所在文件,再进行查询即可,这样缩小了数据查询范围,而且还可以通过多机器协同查询,加快查询速度.对采用以上方法划分存储的数据进行了Top-K查询,验证了方法的有效性.
With the increase of network data and production data, data storage and query are facing severe challenges. Data partitioning technology can be stored in a large number of data storage in a number of machines, both to solve the problem of single storage capacity, but also through the division of the range to narrow the range of data query. Therefore, on the background of data partitioning method for massive data storage and query, designed the massive data according to the angle and distance calculation in the data range, and stored in the distribution of the interval corresponding to the machine file, in order to achieve a large amount of data files with a small amount of data file storage, query the data can first find the index table by the interval data file, and then you can query, thus reducing the scope of data query, but also through multi robot collaborative query speed up queries. By using the above method to partition the data stored in the top-K query, the validity of the method is verified.
出处
《南开大学学报(自然科学版)》
CAS
CSCD
北大核心
2017年第3期1-8,共8页
Acta Scientiarum Naturalium Universitatis Nankaiensis
基金
天津市自然科学基金(14ZCZDGX00032
14ZXDZGX00867
15ZXDSGX00090
15ZXHLGX00360
15ZXH LGX00380)