摘要
提出新型的近似查询处理方法,以克服近似查询处理任务中数据偏斜所导致的查询准确率低的问题.该方法以条件生成对抗神经网络为基础,融入条件变分自编码器,保证算法执行的稳定性,提高模型准确率;使用Wasserstein距离衡量模型误差,防止模型坍塌.基于该条件生成模型实现近似查询处理,回答用户查询而无须访问底层数据,避免磁盘交互,并与聚集预计算相结合,构成高效的近似查询处理框架,能更加准确、快速地回答交互式查询.设计高效的表决算法,对模型生成的样本以及样本内部数据进行过滤,提高生成的样本质量,最小化查询误差.实验结果表明,与其他近似查询处理算法相比,该方法可以有效克服数据偏斜的影响,同时能够在更短的交互时间内更加准确地回答用户查询.
A new approximate query processing method was proposed to solve the problem of low query accuracy caused by data skew in the approximate query processing.First of all,the algorithm was based on the conditional generative adversarial network and incorporated the conditional variational auto-encoder to ensure the stability and the accuracy of the algorithm.The Wasserstein distance was used to measure the model error to eliminate model collapse.Secondly,based on the above generative model,approximate query processing was achieved and users’queries were answered without accessing the underlying data,avoiding disk interaction.The model was combined with aggregate precomputation to form an efficient approximate query processing framework to answer interactive queries more accurately and quickly.Finally,an efficient voting algorithm was designed to filter the samples generated by the model and the internal data of the samples,so as to improve the quality of the generated samples and minimize the query error.Experimental results show that,compared with other approximate query processing algorithms,the method proposed can effectively overcome the influence of data skew and answer queries more accurately in shorter interaction time.
作者
白文超
韩希先
王金宝
BAI Wen-chao;HAN Xi-xian;WANG Jin-bao(College of Computer Science and Technology,Harbin Institute of Technology,Weihai 264201,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2022年第5期995-1005,共11页
Journal of Zhejiang University:Engineering Science
基金
国家自然科学基金资助项目(61872106,61832003,61632010)。
关键词
条件生成对抗网络
条件变分自编码器
近似查询处理
聚集预计算
数据偏斜
conditional generative adversarial network
conditional variational auto-encoder
approximate query processing
aggregate precomputation
data skew