摘要
大规模数据集已经超过TB和PB级,现有的技术可以收集和存储大量的信息。虽然数据库管理系统一直在不断提高提供复杂的多种数据管理的能力,但是管理查询工具并不能满足大数据的需求,如何精准理解和探索这些大规模数据集仍然是一个巨大的挑战。交互式数据探索(interactive data exploration,IDE)的关注点是强调交互、探索和发现,能让用户从海量的数据中用最小的代价更精确地找到他们需要的信息。首先对交互式数据探索及其应用背景进行了介绍,总结了通用的探索模型和IDE的特点,分析了交互式数据探索中的查询推荐技术和查询结果优化技术的现状;随后分别对IDE原型系统进行了分析和比较;最后给出了关于交互式数据探索技术的总结和展望。
Large data sets have exceeded the scale of terabytes and petabytes,and existing techniques can collect andstore massive information.While database management systems have been constantly improved to offer a variety ofcomplex data management capabilities,but the query tools cannot satisfy the needs of large data,so how to preciselyunderstand and explore the massive data set remains a huge challenge.The focus of interactive data exploration(IDE)is to emphasize interaction,exploration and discovery.Users will accurately find the information they need with theminimum cost in the vast amounts of data.Firstly,this paper introduces the IDE and its application background,summarizesthe general model and features of IDE,and analyzes the present situation of the query technology and the optimizationtechniques for query results.Furthermore,this paper analyzes and compares IDE prototype systems respectively.Finally,this paper summarizes and forecasts the techniques of IDE.
作者
王蒙湘
李芳芳
谷峪
于戈
WANG Mengxiang;LI fangfang;GU Yu;YU Ge(Department of Computer Science, College of Computer Science and Engineering, Northeastern University, Shenyang 110819, China)
出处
《计算机科学与探索》
CSCD
北大核心
2017年第2期171-184,共14页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61272180
中央高校基本科研业务费专项资金No.N161604005~~
关键词
交互式数据探索
查询推荐
查询结果优化
用户反馈
机器学习
interactive data exploration
query recommendation
optimization for query results
user feedback
machine learning