摘要
在许多应用程序中,例如数据清理,记录链接,Web搜索和文档分析,相似性查询处理变得越来越重要。该方法使用现有的运行时运算符来实现这种复杂的联接算法,而无须重新发明轮子。这样可以使系统自动受益于这些操作员的未来改进。该方法包括一种技术,该技术通过使用很大程度上以系统用户级查询语言表示的模板,在查询优化期间将相似性联接计划转换为基于操作员的有效物理计划;这项技术大大简化了这种转换规则的规范。我们使用并行大数据管理系统Apache AsterixDB来说明和验证我们的技术。我们使用并行计算集群上的几个大型真实数据集进行了一项实验研究,以评估相似性查询支持。
In many applications,such as data cleansing,logging links,Web searches,and document analysis,similarity query processing becomes increasingly important.The method USES existing runtime operators to implement this complex join algorithm without reinventing the wheel.This allows the system to automatically benefit from future improvements by these operators.The method includes a technique that converts the similarity join plan into an operator-based valid physical plan during query optimization by using templates that are largely expressed in the system user-level query language;This technique greatly simplifies the specification of this transformation rule.We use the parallel big data management system Apache AsterixDB to illustrate and validate our technology.We conducted an experimental study using several large real data sets on parallel computing clusters to evaluate similarity query support.
作者
杜伍
陈琳
DU Wu;CHEN Lin(Yangtze University,Jingzhou 434000,China)
出处
《电脑知识与技术》
2020年第5期3-4,15,共3页
Computer Knowledge and Technology