期刊文献+

文本处理中的MapReduce技术 被引量:18

MapReduce in Text Processing
下载PDF
导出
摘要 用于文本处理的很多数据集已经达到TB、PB甚至更大规模,传统的单机方法难以对这些数据进行有效处理。近年来出现的MapReduce计算框架能够以简洁的形式和分布式的方案来解决大规模数据的并行处理问题,得到了学术界和工业界的广泛认可和使用。目前,MapReduce已经被用于自然语言处理、机器学习及大规模图处理等领域。该文首先对MapReduce做了简单的介绍,并分析了其特点、优势还有不足;然后对MapReduce近年来在文本处理各个方面的应用进行分类总结和整理;最后对MapReduce的系统和性能方面的研究也做了一些介绍与展望。 With the development of the internet,the text processing area is challenged to deal with web scale dataset.It is intractable for traditional approaches computing effectively on peta-scale data volumes.MapReduce emerged to address this issue with distributed and parallel processing methods,which has been widely recognized and studied both in the academic and in industry.In natural language processing,machine learning,large-scale graph processing and statistical machine translation,there have been many successful application of this technique.In this paper we first give a brief introduction to MapReduce,revealing its advantages,limitations,and differences with traditional techniques.Then we present a classification and summary to MapReduce applications in some aspects of text processing.Finally,we introduce the system and performance research of MapReduce and analyze possible applications of MapReduce in the future.
作者 李锐 王斌
出处 《中文信息学报》 CSCD 北大核心 2012年第4期9-20,共12页 Journal of Chinese Information Processing
基金 自然科学基金资助项目(61070111)
关键词 文本处理 MAPREDUCE 分布式计算 综述 HADOOP text processing MapReduce distributed computing survey Hadoop
  • 相关文献

参考文献62

  • 1J. Lin, D. Metzler, T. E:lsayed, et al. Of Ivory and Smurfs: Loxodontan MaptReduce Experiments for Web Search[C]//Proceedings of TREC, 2010.
  • 2Michael, Daniel Abadi, David J. DeWitt, et al. Ma- pReduce and Parallel DBMSs: Friends or Foes [J] Communications of the ACM, 2010, 53(1).
  • 3Michael Isard, Mihai Budiu, YuarL Yu, et al. Dryad: distributed data-parallel programs from sequential building blocks [C]//Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, Lisbon, Portugal, 2007.
  • 4J. Dean, S. Ghemawat. MapReduce.. Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51 (1): 107-113.
  • 5Jeffrey Dean, Sanjay Gheraawat. MapReduce:a flexi- ble data processing tool[J]. Communications of the ACM, January 2010, 53(1).
  • 6Jimmy Lin, Chris Dyer. Data-Intensive Text Process- ing with MapReduce[M]. 2010.
  • 7R. M. C. McCreadie, C. Macdonald, I. Ounis. On single-pass indexing with MapReduce[C]//Proceed- ings of 32nd SIGIR, 2009 :742-743.
  • 8R. McCreadie, C. Mcdonald, I. Ounis. Comparing Distributed Indexing.. To MapReduce or Not? [C]// Proceedings of LSDS-IR, 2009, CEUR Workshop Proceedings, 80: 41-48.
  • 9Jimmy Lin. Brute force and indexed approaches to pairwise document similarity comparisons with Map- Reduce [C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, :Boston, MA, USA: 2009.
  • 10Daniel Peng, Frank Dal:ek. Large-scale Incremental Processing Using Distributed Transactions and Noti- fieations[C]//Proceedings of the 9th USENIX Sym- posium on Operating Systems Design and Implemen- tation, 2010.

共引文献2

同被引文献260

引证文献18

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部