期刊文献+

基于Apache AsterixDB的相似性查询

Similarity query Based on Apache AsterixDB
下载PDF
导出
摘要 在许多应用程序中,例如数据清理,记录链接,Web搜索和文档分析,相似性查询处理变得越来越重要。该方法使用现有的运行时运算符来实现这种复杂的联接算法,而无须重新发明轮子。这样可以使系统自动受益于这些操作员的未来改进。该方法包括一种技术,该技术通过使用很大程度上以系统用户级查询语言表示的模板,在查询优化期间将相似性联接计划转换为基于操作员的有效物理计划;这项技术大大简化了这种转换规则的规范。我们使用并行大数据管理系统Apache AsterixDB来说明和验证我们的技术。我们使用并行计算集群上的几个大型真实数据集进行了一项实验研究,以评估相似性查询支持。 In many applications,such as data cleansing,logging links,Web searches,and document analysis,similarity query processing becomes increasingly important.The method USES existing runtime operators to implement this complex join algorithm without reinventing the wheel.This allows the system to automatically benefit from future improvements by these operators.The method includes a technique that converts the similarity join plan into an operator-based valid physical plan during query optimization by using templates that are largely expressed in the system user-level query language;This technique greatly simplifies the specification of this transformation rule.We use the parallel big data management system Apache AsterixDB to illustrate and validate our technology.We conducted an experimental study using several large real data sets on parallel computing clusters to evaluate similarity query support.
作者 杜伍 陈琳 DU Wu;CHEN Lin(Yangtze University,Jingzhou 434000,China)
机构地区 长江大学
出处 《电脑知识与技术》 2020年第5期3-4,15,共3页 Computer Knowledge and Technology
关键词 大数据管理系统 APACHE AsterixDB 相似性查询 并行数据库 优化 Big data management system Apache AsterixDB Similarity query Parallel database To optimize
  • 相关文献

参考文献2

二级参考文献10

  • 1Jokinen P, Ukkonen E. Two Algorithms for Approximate String Matching in Static Texts[M]. Mathematical Foundations of Computer Science 1991. Springer Berlin Heidelberg, 1991:240-248.
  • 2Burkhardt S, Crauser A, Ferragina P, et al. Q-gram Based Database Searching Using a Suffix Array ( QUASAR ) [C]. Proceedings of the Third Annual International Conference on Computational Molecular Biology. ACM,1999:77-83.
  • 3Gravano L, Ipeirotis P G, Jagadish H V, et al. Approximate String Joins in a Database(almost)for Free[C]. VLDB. 2001, 1:491-500.
  • 4Li C, Lu J, Lu Y. Efficient Merging and Filtering Algorithms for Approximate String Searches[C]. Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on. IEEE, 2008:257-266.
  • 5Xiao C, Wang W, Lin X. Ed-join: an Efficient Algorithm for Similarity Joins with Edit Distance Constraints[J]. Proceedings of the VLDB Endowment, 2008, 1 (1): 933-9d4.
  • 6Sutinen E, Tarhio J. On Using Q-gram Locations in Approximate String Matching[M]. Algorithms--ESA'95. Springer Berlin Heidelberg, 1995:327-340.
  • 7Califano A, Rigoutsos I. FLASH: A Fast Look-Up Algorithm for String Homology[C]. Computer Vision and Pattern Recognition, 1993. Proceedings CVPR'93., 1993 IEEE Computer Society Conference on. IEEE, 1993:353-359.
  • 8Kernighan B W, Ritchie D M. The C Programming Language[M]. Englewood Cliffs: Prentice-Hall, 1988.
  • 9任看看,钱雪忠.协同过滤算法中的用户相似性度量方法研究[J].计算机工程,2015,41(8):18-22. 被引量:25
  • 10任星怡,宋美娜,宋俊德.基于用户签到行为的兴趣点推荐[J].计算机学报,2017,40(1):28-51. 被引量:49

共引文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部