摘要
大数据时代数据呈爆发式增长,传统索引结构难以处理庞大复杂的数据,为解决这一问题,学习索引应运而生,并成为当前数据库领域的研究热点之一。学习索引利用机器学习模型进行索引构建,通过对数据和物理位置之间的关系进行训练和学习得到学习模型,掌握二者之间的分布特点和规律,从而实现对传统索引的改进和优化。大量实验表明,与传统索引相比,学习索引可以适应大规模数据集,提供更好的搜索性能,具有更低的空间要求。文中详细介绍了学习索引的应用背景,梳理了现有的学习索引模型;根据数据类型的不同,将学习索引分为一维和多维两种类别,并对每种类别中学习索引模型的优缺点和可以支持的查询进行了详细的介绍和分析;最后对学习索引的未来研究方向进行了展望,以期为相关研究提供参考。
Due to the explosive growth of data in the era of big data, it is difficult for the traditional index structures to handle this huge and complex data.In order to solve this problem, the learned index has emerged and become one of the most popular research topics in the database.Learned indexes employ machine learning models for index construction.By training and learning the relationship between data and physical location, the learning model can be obtained so as to master the distribution characte-ristics between the two to realize the improvement and optimization of the traditional index.Extensive experiments show that learned indexes can adapt to large-scale data sets, and provide better search performance with lower memory requirements than traditional indexes.This paper introduces the applications of learned indexes and reviews the existing learned index models.According to data types, learned indexes are divided into two categories: one-dimensional and multi-dimensional.The advantages, disadvantages, and supported searches of learned index models in each category are also introduced and analyzed in detail.Finally, some future research directions of learned indexes are prospected to provide references for related researches.
作者
王艺潭
王一舒
袁野
WANG Yitan;WANG Yishu;YUAN Ye(School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
出处
《计算机科学》
CSCD
北大核心
2023年第1期1-8,共8页
Computer Science
基金
国家重点研发计划(2022YFB2702100)
国家自然科学基金(61932004,62225203,U21A20516)。
关键词
学习索引
机器学习
索引构建
数据结构
数据库
Learned index
Machine learning
Index construction
Data structure
Database