期刊文献+

基于多谓词选择的海量XML数据并行查询方法 被引量:3

Method of Parallel Query for Massive XML Data Based on Multi-predicate Selectivity
下载PDF
导出
摘要 为了解决海量XML数据查询的问题,提出了MapReduce编程模型下多谓词选择的查询处理方法.该方法并行查询海量XML数据,产生的并行查询结果满足用户给定的多谓词查询要求.提出海量XML数据的存储方法,将海量XML数据划分为众多XML数据块存储到HDFS中.提出MapReduce编程模型下基于多谓词选择的Map逻辑算法和Reduce逻辑算法,实现海量XML数据的并行查询处理.进一步提出基于多谓词选择的MapReduce查询优化方法,减少系统的数据传输量,提高了系统的性能.最后,通过实验验证了所提方法的有效性. In order to resolve the problem of query for massive XM L data,a processing method of parallel query for massive XM L data based on multi-predicate selectivity under M apReduce programming model is proposed. The produced parallel query results can satisfy query request of user's given multi-predicate selectivity. The storage method of massive XM L data is proposed. The massive XM L data is partitioned into many XM L data blocks and loaded on HDFS. The M ap logic algorithm and the Reduce logic algorithm based on multi-predicate selectivity under M apReduce programming model are proposed,and they can realize parallel query processing for massive XM L data. Furthermore,a method of query optimization using M apReduce based on multi-predicate selectivity is proposed. The method can reduce the amount of data transmission and improve the performance of the system. Finally,the efficiency and effectiveness of the approach are also demonstrated by experimental results.
作者 闫威 马宗民
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第7期1415-1420,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61370075 60873010)资助 辽宁大学青年科研基金项目(2012LDQN19)资助
关键词 海量XML数据 MapReduce编程模型 多谓词选择 并行查询 massive XM L data M apReduce programming model multi-predicate selectivity parallel query
  • 相关文献

参考文献4

二级参考文献50

  • 1高洁,吉根林.文本分类技术研究[J].计算机应用研究,2004,21(7):28-30. 被引量:36
  • 2Sebastiani F. Text Categorization[Z]. Encyclopedia of Database Technologies and Applications. 2005..683-687.
  • 3Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TF1DF for Text Categorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1997.
  • 4Yang Y. An Evaluation of Statistical Approaches to Text Categorization[J]. Journal of Information Retrieval, 1999, 1 (1/2) :67-88.
  • 5Rocchio J J Jr. Relevance Feedback in Information Retrieval [M]. Salton G, ed. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Inc. , Englewood Cliffs, New Jersey, 1971 : 313-323.
  • 6Tzeras K, Hartmann S. Automatic Indexing Based on Bayesian Inference Networks[C]//Proc. 16th ACM Int. SIGIR Conference. 1993: 22-34.
  • 7Masand B, Lino G, Waltz D. Classifying News Stories Using Memory Based Reasoning[C]//15th ACM SIGIR Conference. 1992:59-65.
  • 8Apte C, Damerau F, Weiss S. Automated Learning of Decision Rules for Text Categorization[J]. ACM Trans. on Information Systems, 1994,12(3) : 233-251.
  • 9Joachims T. Text Categorization with Support Vector Machines:Learning with Many Relevant Features [C]//Proc. 10th European Conference on Machine Learning (ECML). 1998:137-142.
  • 10Salton G, Buckley C. Term Weighting Approaches in Automatic Text Retrieval [J]. Information Processing and Management, 1988,24(5) :513-523.

共引文献59

同被引文献20

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部