期刊文献+

两种中文分词算法在云计算平台上的实现及比较 被引量:5

Two Chinese Word Segmentation Algorithms in the Realization of the Cloud Computing Platform
原文传递
导出
摘要 当前IKAnalyzer(IK)和ICTCLAS(IC)是主流的中文分词算法。文中首先通过理论对比二者在单机环境下的性能,然后使用Hadoop集群、Hadoop分布式文件管理系统(HDFS)和并行处理大数据集的Map Reduce组成的框架,利用优化后的算法,通过大量的实验对二者在分布式环境下处理大数据集的表现做出比较。 the current IKAnalyzer ( IK ) and ICTCLAS ( IC ) is the mainstream of the Chinese segmentation algorithm. Firstly by theoretical performancecomparison between these two in the single machine environment, and then use the Hadoop cluster, Hadoop distributed file management system ( HDFS ) and parallel processing of large data sets composed of MapReduceframework, using the optimized algorithm, by comparison to make a lot of experiments on the two in the distributed environment for processing large data sets performance.
出处 《网络安全技术与应用》 2014年第12期67-67,71,共2页 Network Security Technology & Application
基金 天津市自然科学基金 合同编号:13JCYBJC16800
关键词 IKAnalyzer 倒置排序 HDFS MAP REDUCE HADOOP ICTCLAS IKAnalyzer HDFS MapReduce inverted sort Hadoop
  • 相关文献

参考文献3

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:197
  • 2Cloudera.http : //wvv.cloudera.com/blog/2OO9/ll2/the-small -files-problem/.
  • 3Hadoop.Documentation and open sourcereleases : http : / /hadoop.apache.org/core/.2004.01.010.

二级参考文献27

  • 1H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 2Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 3S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 4J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 5Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286
  • 6Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62
  • 7Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143
  • 8J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998
  • 9Andi Wu, Zixin Jiang. Word segmentation in sentence analysis.1998 Int'l Conf on Chinese Information Processing, Beijing, 1998
  • 10D Palmer. A trainable rule-based algorithm for word segmentation. The 35th Annual Meeting of the Association for Computational Linguistics (ACL'97), Madrid, 1997

共引文献196

同被引文献43

引证文献5

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部