期刊文献+

MapReduce环境下支持大规模文本检索的概念索引 被引量:1

Concept Index Supporting Large Scale Text Retrieval Under MapReduce Enviroment
下载PDF
导出
摘要 随着信息化技术飞速发展,爆炸性数据的增长以及数据的多样化给大数据检索带来了挑战。MapReduce作为一种并行处理框架,在大数据处理上具有明显优势。为此,结合概念格的相关知识,采用形式概念分析发现文档之间的关系并用格进行表示,提出一种新型的支持大规模文本检索的形式概念索引结构,给出基于MapReduce框架建立概念索引的相关算法。通过与Lucene索引进行比较,验证了所提索引的有效性。实验结果表明,将文档之间关系采用概念格表示并建立概念索引,能够提高大规模文本检索的性能。 With high speed developing of the informatization,the coming of big data era brings some revolution to the world,and it becomes a challenge for big data searching by its explosive growth and variety. MapReduce is commonly used in processing big data and shows its great advantages. Combined with the relative knowledge of lattice, this paper uses Form Concept Analysis (FCA) to discover the relationships among textual documents and expresses them with lattice, and proposes a novel conceptually index structure, which supports large scale data retrieval. In addition, it describes the related algorithms for building conceptual index. Compared with Lucene index,conceptual index supporting queries has better efficiency. Experimental results show that using lattice to express the relationship of documents and indexing it with conceptual can significally improve the performance of large scale documents retrieval.
作者 张生 胡加靖
出处 《计算机工程》 CAS CSCD 北大核心 2015年第7期48-54,共7页 Computer Engineering
关键词 大数据 MAPREDUCE框架 数据检索 形式概念分析 概念格 概念索引 big data MapReduce framework data retrieval Formal Concept Analysis (FCA) concept lattice conceptual index
  • 相关文献

参考文献13

  • 1Melink S,Raghavan S,Yang B,et al.Building a Distributed Full-text Index for the Web[J].ACM Transactions on Information Systems,2001,19(3):217-241.
  • 2Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 3Witten I H,Moffat A,Bell T C.Managing Gigabytes:Compressing and Indexing Documents and Images[M].[S.l.]:Morgan Kaufmann,1999.
  • 4Kumar C.Designing Role-based Access Control Using Formal Concept Analysis[J].Security and Communication Netw orks,2013,6(3):373-383.
  • 5陈湘,吴跃.基于基集与概念格的关联规则挖掘算法[J].计算机工程,2010,36(19):34-36. 被引量:5
  • 6Abusukhon A,Talib M,Oakes MP.An Investigation into Improving the Load Balance for Term-based Partitioning[M]//Kaschek R,Kop C,Steinberger C,et al.Information Systems and E-Business Technologies.Berlin,Germany:Springer-Verlag,2008:380-392.
  • 7Anh V N,Moffat A.Inverted Index Compression Using Word-aligned Binary Codes[J].Information Retrieval,2005,8(1):151-166.
  • 8Lin J,Dyer C.Data-intensive Text Processing with MapReduce[M].[S.l.]:Morgan and Claypool Publishers,2010.
  • 9Krajca P,Vychodil V.Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework[M]//Adams N M,Robardet C,Siebes A,et al.Intelligent Data Analysis.Berlin,Germany:Springer-Verlag,2009:334-344.
  • 10Xu Biao,de Fréin R,Robson E,et al.Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework[M]//Domenach F,Lgnatov I D,Poelmans J.Formal Concept Analysis.Berlin,Germany:Springer-Verlag,2012:292-308.

二级参考文献4

  • 1Agrawal R, Imielinski T, Swami A. Mining Association Rules Between Sets of Items in Large Databases[C]//Proe. of ACM SIGMOD Conference on Management of Data. Washington D. C., USA: ACM Press, 1993.
  • 2Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules[C]//Proc. of the 20th International Conference on Very Large Data Bases. [S. l.]: Morgan Kaufmann Press, 1994.
  • 3Chert Xiang, Zhang Yi, Wu Yue, Mining Association Rules Based on Seed Items and Weights[J]. Lecture Notes in Computer Science, 2005. (3613): 603-608.
  • 4刘霜霜,饶天贵,孙建华.基于改进概念格的无冗余关联规则提取[J].计算机工程,2010,36(10):52-55. 被引量:3

共引文献4

同被引文献1

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部