期刊文献+

一种有效的量化交易数据相似性搜索方法 被引量:26

An Efficient Method for Similarity Search on Quantitative Transaction Data
下载PDF
导出
摘要 量化交易数据与一般交易数据的不同之处在于它在各个维上的值是数值型而不是二值型的 研究这种数据的有效的相似性搜索方法是一个重要而具有挑战性的课题 提出了一个新的相似性度量函数Hsim() ,这个度量函数可以较好地克服Lp 等传统的距离函数在高维空间中的缺点 ,并能将二值型和数值型数据距离的计算整合到一个统一的框架中去 结合量化交易数据的特点 ,构造了定义在该函数上的相似性索引结构 ,并对建立在该索引结构上的相似性查询方法进行了阐述 实验表明 ,这种搜索方法对量化交易数据的相似性搜索有较高的修剪率 。 The difference of the quantitative transaction data from the common transaction data is that the value of each dimension is quantitative, not binary. The study of the efficient method for similarity search on the quantitative transaction data is very important and challenging. A new function Hsim() is presented to measure the proximity of objects in high dimensional spaces. The function can overcome the shortcoming of L p-norm and other distance functions, and adapt to binary and numerical data. According to the characteristic of the quantitative transaction data, a similarity indexing structure based on Hsim() is constructed, and an algorithm for similarity search on quantitative transaction data is also described. Experiments demonstrate that this method has very good pruning efficiency for similarity search on the quantitative transaction data, so it can greatly speed the similarity search.
出处 《计算机研究与发展》 EI CSCD 北大核心 2004年第2期361-368,共8页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目 ( 2 0 0 1AA113 181) 上海市科学技术发展基金项目 (0151150 10) 信息产业部科研试制计划基金项目 ( 0 1XK3 10 0 12 )
关键词 相似性搜索 高维数据 距离函数 量化交易数据 索引结构 similarity search high dimensional data distance function quantitative transaction data index structure
  • 相关文献

参考文献16

  • 1A Guttman. R-Tree: A dynamic index structure for spatial searching. The ACM SIGMOD Int'l Conf on Management of Data, Boston, MA, 1984
  • 2T Sellis, N Roussopoulos, C Faloutsos. The R+ tree: A dynamic index for multidimensional objects. The 13th Int'l Conf on Very Large Data Bases, Brighton, England, 1987
  • 3N Beckman, H-P Kriegel, R Schneider et al. The R*-tree: An efficient and robust method for points and rectangles. The ACM SIGMOD Int'l Conf on Management of Data, Atlantic City, NJ, 1990
  • 4N Katayama, S Satoh. The SR-tree: An index structure for high dimensional nearest neighbor queries. The ACM SIGMOD Int'l Conf on Management of Data, Tucson, Arizona, USA, 1997
  • 5S Berchtold, D Keim, H-P Kriegel. The X-tree: An index structure for high-dimensional data. The 22nd Int'l Conf on Very Large Data Bases, Bombay, India, 1996
  • 6S Berchtold, C Bhm, H V Jagadish et al. Independent quantization: An index compression technique for high-dimensional data spaces. The 16th Int'l Conf on Data Engineering, San Diego, California, USA, 2000
  • 7Y Sakurai, M Yoshikawa, S Uemura et al. The A-tree: An index structure for high-dimensional spaces using relative approximation. The 26th Int'l Conf on Very Large Data Bases, Cairo, Egypt, 2000
  • 8R Weber, H J Scheck, S Blott. A quantitative analysis and performance study for similarity search methods in high dimensional spaces. The 24th Int'l Conf on Very Large Data Bases, New York City, New York, USA, 1998
  • 9K Beyer, J Goldstein, R Ramakrishnan et al. When is nearest neighbors meaningful? The 7th Int'l Conf on Database Theory, Jerusalem, Israel, 1999
  • 10C C Aggarwal, A Hinneburg, D Keim. On the surprising behavior of distance metrics in high dimensional space. The 8th Int'l Conf on Database Theory, London, UK, 2001

同被引文献268

引证文献26

二级引证文献277

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部