一种自适应子空间相似性搜索方法被引量：1

An Adaptive Subspace Similarity Search Approach

下载PDF

导出

摘要近年来,在多媒体信息检索、相似性连接和时问序列匹配等数据库领域的相似搜索研究备受关注。绝大部分工作都是在欧式空间条件下,使用度量距离函数计算最近邻(如kNN、kNNJ)来解决搜索目标集合问题。但已有研究表明,此条件下的搜索结果准确性很容易受到高差异维度的影响,且对应的解决方案尚缺乏灵活性和顽健性。首先提出了单机环境下动态子空间(部分维度)下相似搜索问题及解决方案。随着数据规模的扩大,单机算法不能很好地扩展,随之又提出了Hadoop框架下的分布式算法。实验证实,在不影响准确率的情况下,分布式算法的性能要优于集中式算法。 In recent years, such database fields as multimedia information retrieval, similarity join and time series matching, where similarity search has attracted much attention. Existing researches mostly compute nearest neighbor to solve problems about search target set, such as kNN and kNNJ, by metric distance functions in the Euclidean space. But some studies showed that high dissimilarity dimensions had got great effect on the accuracy of answer and flexibility and robustness still were lacked in corresponding solutions. Thus centralized dynamic subspace or partial dimensions similarity search problem and algorithms were proposed at first. Furthermore, with the emerge of very large dataset, centralized algorithms can~ extend very well. Finally, the distributed ones under hadoop framework were proposed. Experiments prove that distributed algorithms outperform centralized ones in the performance without accuracy loss.

作者任建新陈华辉

机构地区宁波大学信息科学与工程学院

出处《电信科学》北大核心 2015年第7期63-74,共12页 Telecommunications Science

关键词自适应子空间相似性搜索非度量距离方法 MapReduce分布式计算框架 adaptive subspace, similarity search, non-metric distance method, MapReduce distributed computingframework

分类号 TP393.092 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献18

1Lian X. Chen L. Subspace similarity search under Lp-Norm. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(2):365-382.
2Hinneburg A, Aggarwal C, Keim D A. What is the nearest neighbor in high dimensional spaces. Proceedings of the 26th VLDB Conference, Cairo, Egypt, 2000:506-515.
3张慧,郑吉平,韩秋廷.BTreeU-Topk:基于二叉树的不确定数据上的Top-k查询算法[J].计算机研究与发展,2012,49(10):2095-2105. 被引量：2
4Shi Y, Graham B. Similarity search problem research on multi-dimensional data sets. Proceedings of Tenth International Conference on Information Technology: New Generations (ITNG), Washington DC, USA ,2013:573-577.
5张彪,李川,徐洪宇,李艳梅,杨宁,罗谦.基于特征子图的异构信息网络节点相似性度量[J].电信科学,2014,30(11):66-72. 被引量：4
6Watanabe S, Sawada H, Minami Y, et al. Fast similarity search on a large speech data set with neighborhood graph indexing. Proceedings of 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, USA, 2010: 5358.5361.
7Marios,Yannis. R-trees: a dynamic index structure for spatial searching. Boston, MA, USA ,1984:993-1002.
8Kriegel H P, Kroger P, Schubert M, et ol. Efficient query processing in arbitrary subspaees using vector approximations. Proceedings of the 18th International Conference on Scientific and Statistical Database Management, Washington DC, USA, 2006:184 . 190.
9Zhang D X, Agrawal D, Chen G, et al. HashFile : an efficient index structure for multimedia data. Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE), Washington DC, USA, 2011:1103.1114.
10Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions. Proceedings of the 20th Annum Symposium on Computational Geometry, New York, USA, 2004:253-262.

二级参考文献40

1Sarma A D, Benjelloun O, Halevy A, et al. Working models for uncertain data [C] //Proc of the 22nd IEEE Int Conf on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society, 2006.
2Tao Yufei, Cheng R, Xiao Xiaokui, et al. Indexing multi dimensional uncertain data with arbitrary probability density functions [C] //Proc of the 31st Int Conf on Very Large Data Bases (VLDB). New York: ACM, 2005:922-933.
3Kriegel H P, Kunath P, Pfeifle M, et al. Probabilistic Similarity join on uncertain data [C] //Proc of the Int Conf on Database Systems for Advanced Applications (DASFAA). Berlin:Springer, 2006: 295-809.
4Kriegel H P, Kunath P, Renz M neighbor query on uncertain objects Int Conf on Database Systems for (DASFAA). Berlin: Springer, 2007 Probabilistic nearest-[C] //Proc of the 12th Advanced Applications :337-348.
5Pei Jian, Jiang Bin, Lin Xuemin, et al. Probalilistic skylines on uncertain data [C] //Proc of the 33rd Int Conf on Very Large Data Base (VLDB). New York: ACM, 2007:15-26.
6Soliman M A, llyas I F. Top-k Query processing in uncertain databases [C] //Proe of the 23rd Int Conf on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society, 2007:896-905.
7Soliman M A, Ilyas I F, Chang K C. Probabilistic top k and ranking-aggregate queries [J]. ACM Trans on Database Systems (TODS), 2008, 33(3): 1-54.
8Ilyas I F, Beskales G, Soliman M A. A survey of top-k query processing techniques in relational database systems [J]. ACM Computing Surveys (CSUR), 2008, 40(4): 1-58.
9Hua Ming, Pei Jian, Zhang Wenjie, et al. Ranking queries on uncertain data: A probabilistic threshold approach [C] // Proc of the 2008 ACM SIGMOD Int Conf on Management of data (SIGMOD). New York: ACM, 2008:673-686.
10Yi Ke, Li Feifei, Kollios G, et al. Efficient processing of Topk queries in uncertain databases with x Relations [J]. IEEE Trans on Knowledge and Data Engeering (TKDE), 2008, 20(12): 1669-1682.

共引文献4

1张军,王永利.不确定性多维传感器数据的有效存储与查询方法[J].南京理工大学学报,2014,38(6):750-756.
2邱庆羽,李婧,全兵,童超,张利君,张海仙.基于文献信息网络语义特征的相似性搜索[J].计算机应用,2018,38(5):1327-1333. 被引量：4
3王娜娜,高红,李珊珊,刘巍.基于异质超边的超图[J].广东工业大学学报,2017,34(1):6-10. 被引量：3
4王少峰,郭俊霞,卢罡.基于用户导向的异构网络语义预测算法研究[J].计算机工程与应用,2018,54(15):147-154. 被引量：1

同被引文献9

1王哲,徐燕文.基于差异化融合的语义信息检索模型仿真[J].微电子学与计算机,2015,32(1):146-149. 被引量：2
2张一洲.基于用户兴趣的个性化信息检索方法研究[J].现代情报,2015,35(6):25-28. 被引量：4
3李金忠,杨威,夏洁武,曾小荟,孙凌宇.基于Hooke & Jeeves模式搜索的排序学习方法[J].计算机工程,2015,41(7):215-218. 被引量：3
4王莉军.海量数据下的文本信息检索算法仿真分析[J].计算机仿真,2016,33(4):429-432. 被引量：16
5左家莉,王明文,吴水秀,万剑怡.结合句子级别检索的信息检索模型[J].中文信息学报,2016,30(2):107-112. 被引量：6
6王明文,洪欢,江爱文,左家莉.基于词重要性的信息检索图模型[J].中文信息学报,2016,30(4):134-141. 被引量：11
7闫瑶瑶,李永先.基于“稀缺理论”的信息检索认知模型研究[J].情报杂志,2016,35(11):136-140. 被引量：13
8何旭峰,陈岭,陈根才,钱坤,吴勇,王敬昌.基于LDA主题模型的分布式信息检索集合选择方法[J].中文信息学报,2017,31(3):125-133. 被引量：22
9沈夏炯,叶曼曼,甘甜,韩道军.基于概念格的信息检索及其树形可视化[J].计算机工程与应用,2017,53(3):95-99. 被引量：20

引证文献1

1於馨彦,孙瑞玲.医院特定患者信息资源快速检索仿真研究[J].计算机仿真,2017,34(12):389-392. 被引量：3

二级引证文献3

1任学军.医院特定患者实时财务结算信息智能管理系统设计[J].自动化与仪器仪表,2019(1):106-109. 被引量：2
2周秋红.基于人工智能的医院档案信息库资源多层次检索机器设计[J].自动化与仪器仪表,2019,0(10):187-190. 被引量：1
3刘爱珍.妇产科实施优质护理服务的效果分析[J].饮食科学,2019,0(6):291-292.

1孙辉,朱德刚,王晖,赵嘉.自适应子空间高斯学习的粒子群优化算法[J].南昌工程学院学报,2015,34(4):31-42. 被引量：7
2陈学昌,罗小锁,袁燕.滚动窗口自适应子空间预测控制方法及其应用[J].西南师范大学学报（自然科学版）,2015,40(6):124-129. 被引量：1
3张壤文,田学民.带变遗忘因子的自适应子空间预测控制器设计[J].化工学报,2016,67(3):858-864. 被引量：6
4王顺宏,孟飞,夏朝辉,韩勇.有记忆自适应子空间预测控制器设计[J].弹箭与制导学报,2014,34(2):33-37.
5吴小红.双空间搜索进化的遗传算法[J].湖州师范学院学报,2003,25(3):90-92.
6徐媛媛,陈华辉.基于MapReduce的增量式数据集的相似性连接[J].计算机应用研究,2014,31(11):3369-3374. 被引量：2
7郑丽萍.交遇区样本分类的应用[J].山东理工大学学报（自然科学版）,2004,18(6):57-60.
8杨成,冯巍,冯辉,杨涛,胡波.一种压缩采样中的稀疏度自适应子空间追踪算法[J].电子学报,2010,38(8):1914-1917. 被引量：65
9衷路生,颜争,龚锦红,祝振敏.高速列车的自适应子空间预测控制方法[J].华中科技大学学报（自然科学版）,2013,41(8):61-67. 被引量：3
10姚明海,瞿心昱.基于自适应子空间在线PCA的手势识别[J].模式识别与人工智能,2011,24(2):299-304. 被引量：8

电信科学

2015年第7期

浏览历史

内容加载中请稍等...

一种自适应子空间相似性搜索方法被引量：1

参考文献18

二级参考文献40

共引文献4

同被引文献9

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种自适应子空间相似性搜索方法 被引量：1

参考文献18

二级参考文献40

共引文献4

同被引文献9

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

一种自适应子空间相似性搜索方法被引量：1