期刊文献+

一种自适应子空间相似性搜索方法 被引量:1

An Adaptive Subspace Similarity Search Approach
下载PDF
导出
摘要 近年来,在多媒体信息检索、相似性连接和时问序列匹配等数据库领域的相似搜索研究备受关注。绝大部分工作都是在欧式空间条件下,使用度量距离函数计算最近邻(如kNN、kNNJ)来解决搜索目标集合问题。但已有研究表明,此条件下的搜索结果准确性很容易受到高差异维度的影响,且对应的解决方案尚缺乏灵活性和顽健性。首先提出了单机环境下动态子空间(部分维度)下相似搜索问题及解决方案。随着数据规模的扩大,单机算法不能很好地扩展,随之又提出了Hadoop框架下的分布式算法。实验证实,在不影响准确率的情况下,分布式算法的性能要优于集中式算法。 In recent years, such database fields as multimedia information retrieval, similarity join and time series matching, where similarity search has attracted much attention. Existing researches mostly compute nearest neighbor to solve problems about search target set, such as kNN and kNNJ, by metric distance functions in the Euclidean space. But some studies showed that high dissimilarity dimensions had got great effect on the accuracy of answer and flexibility and robustness still were lacked in corresponding solutions. Thus centralized dynamic subspace or partial dimensions similarity search problem and algorithms were proposed at first. Furthermore, with the emerge of very large dataset, centralized algorithms can~ extend very well. Finally, the distributed ones under hadoop framework were proposed. Experiments prove that distributed algorithms outperform centralized ones in the performance without accuracy loss.
出处 《电信科学》 北大核心 2015年第7期63-74,共12页 Telecommunications Science
关键词 自适应子空间 相似性搜索 非度量距离方法 MapReduce分布式计算框架 adaptive subspace, similarity search, non-metric distance method, MapReduce distributed computingframework
  • 相关文献

参考文献18

  • 1Lian X. Chen L. Subspace similarity search under Lp-Norm. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(2):365-382.
  • 2Hinneburg A, Aggarwal C, Keim D A. What is the nearest neighbor in high dimensional spaces. Proceedings of the 26th VLDB Conference, Cairo, Egypt, 2000:506-515.
  • 3张慧,郑吉平,韩秋廷.BTreeU-Topk:基于二叉树的不确定数据上的Top-k查询算法[J].计算机研究与发展,2012,49(10):2095-2105. 被引量:2
  • 4Shi Y, Graham B. Similarity search problem research on multi-dimensional data sets. Proceedings of Tenth International Conference on Information Technology: New Generations (ITNG), Washington DC, USA ,2013:573-577.
  • 5张彪,李川,徐洪宇,李艳梅,杨宁,罗谦.基于特征子图的异构信息网络节点相似性度量[J].电信科学,2014,30(11):66-72. 被引量:4
  • 6Watanabe S, Sawada H, Minami Y, et al. Fast similarity search on a large speech data set with neighborhood graph indexing. Proceedings of 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, USA, 2010: 5358.5361.
  • 7Marios,Yannis. R-trees: a dynamic index structure for spatial searching. Boston, MA, USA ,1984:993-1002.
  • 8Kriegel H P, Kroger P, Schubert M, et ol. Efficient query processing in arbitrary subspaees using vector approximations. Proceedings of the 18th International Conference on Scientific and Statistical Database Management, Washington DC, USA, 2006:184 . 190.
  • 9Zhang D X, Agrawal D, Chen G, et al. HashFile : an efficient index structure for multimedia data. Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE), Washington DC, USA, 2011:1103.1114.
  • 10Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions. Proceedings of the 20th Annum Symposium on Computational Geometry, New York, USA, 2004:253-262.

二级参考文献40

  • 1Sarma A D, Benjelloun O, Halevy A, et al. Working models for uncertain data [C] //Proc of the 22nd IEEE Int Conf on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society, 2006.
  • 2Tao Yufei, Cheng R, Xiao Xiaokui, et al. Indexing multi dimensional uncertain data with arbitrary probability density functions [C] //Proc of the 31st Int Conf on Very Large Data Bases (VLDB). New York: ACM, 2005:922-933.
  • 3Kriegel H P, Kunath P, Pfeifle M, et al. Probabilistic Similarity join on uncertain data [C] //Proc of the Int Conf on Database Systems for Advanced Applications (DASFAA). Berlin:Springer, 2006: 295-809.
  • 4Kriegel H P, Kunath P, Renz M neighbor query on uncertain objects Int Conf on Database Systems for (DASFAA). Berlin: Springer, 2007 Probabilistic nearest-[C] //Proc of the 12th Advanced Applications :337-348.
  • 5Pei Jian, Jiang Bin, Lin Xuemin, et al. Probalilistic skylines on uncertain data [C] //Proc of the 33rd Int Conf on Very Large Data Base (VLDB). New York: ACM, 2007:15-26.
  • 6Soliman M A, llyas I F. Top-k Query processing in uncertain databases [C] //Proe of the 23rd Int Conf on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society, 2007:896-905.
  • 7Soliman M A, Ilyas I F, Chang K C. Probabilistic top k and ranking-aggregate queries [J]. ACM Trans on Database Systems (TODS), 2008, 33(3): 1-54.
  • 8Ilyas I F, Beskales G, Soliman M A. A survey of top-k query processing techniques in relational database systems [J]. ACM Computing Surveys (CSUR), 2008, 40(4): 1-58.
  • 9Hua Ming, Pei Jian, Zhang Wenjie, et al. Ranking queries on uncertain data: A probabilistic threshold approach [C] // Proc of the 2008 ACM SIGMOD Int Conf on Management of data (SIGMOD). New York: ACM, 2008:673-686.
  • 10Yi Ke, Li Feifei, Kollios G, et al. Efficient processing of Topk queries in uncertain databases with x Relations [J]. IEEE Trans on Knowledge and Data Engeering (TKDE), 2008, 20(12): 1669-1682.

共引文献4

同被引文献9

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部