期刊文献+

基于分组的次数与规则剪枝相结合的语言模型压缩方法研究

Research on the Language Model Compression Methods Combining the Grouping Counts and Rule Pruning
下载PDF
导出
摘要 由于庞大的训练语料,统计语言模型的大小往往会超出手持设备的存储能力。随着现阶段资源受限设备的迅速发展,语言模型的压缩研究也就显得更加重要。本文提出了一个语言模型压缩方法,即将次数剪切与规则剪枝方法相结合,并使用分组的方法保证在不减少单元数目的情况下压缩模型。文章对使用新的算法得到的语言模型与次数剪切和规则剪枝方法分别进行困惑度比较。实验结果表明,使用新方法得到的语言模型性能更好。 Currently the size of most statistical language models based on large-scale training corpus always goes beyond the storage ability of many handheld devices. With the rapid development of the limited resource devices, the research on language model compression can meet such requirements. This paper proposes a language model compression method which combined the count cutoff and the pruning method to reduce the size of the language model and uses grouping to compress this model without cell reduction. Our experimental results show that our method can achieve higher perplexity than those of other methods based on the same size.
出处 《计算机工程与科学》 CSCD 2008年第11期129-133,共5页 Computer Engineering & Science
关键词 语言模型压缩 次数剪切 规则剪枝 分组 困惑度 language model compression count cutoff rule pruning grouping perplexity
  • 相关文献

参考文献11

  • 1李晓光,王大玲,于戈.基于统计语言模型的信息检索[J].计算机科学,2005,32(8):124-127. 被引量:9
  • 2邢永康,马少平.统计语言模型综述[J].计算机科学,2003,30(9):22-26. 被引量:37
  • 3MANNING C D,SCHOTZE H.统计自然语言处理基础[M].苑春法,等译.北京:电子工业出版社,2005.
  • 4Seymore K, Rosenfeld R. Scalable Back-off Language Models[C]//Proc of the 4th Int'l Conf on Spoken Language Processing, 19 9 6.
  • 5Stolcke A. Entropy-Based Pruning of Back-off Language Models[C]//Proc of DARPA News Transcription and Understanding Workshop, 1998.
  • 6Wu G Q, Zheng F. A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices [J]. Computer Science & Technology, 2003, 18(6): 747- 755.
  • 7Brown, Peter F, Stephen A. Class-Based n-gram Models of Natural Language[J]. Computational Linguistics, 1992, 18 (4) : 153-157.
  • 8Gao J F, Joshua T, Goodman J, et al. The Use of Clustering Techniques for Language Modeling[C]//Proc of Application to Asian Language, Computational Linguistics and Chinese Language Processing, 2001.
  • 9张仰森,曹元大,俞士汶.语言模型复杂度度量与汉语熵的估算[J].小型微型计算机系统,2006,27(10):1931-1934. 被引量:7
  • 10Chen S f, Goodman J. An Empirical Study of Smoothing Techniques for Language Modeling[C]//Proc of the 34th Annual Meeting of the Association for Computational Linguistics, 1996.

二级参考文献41

  • 1Graff D. The 1998 broadcast news speech and language-model corpus. Slides from lecture at the 1997 DARPA Speech Recognition Workshop, Feb. 1997.
  • 2Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language, 1996, 10:187-228.
  • 3Katz S M. Estimation of probabilities from sparse data for the language model component of speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 1987, ASSP35:400-401.
  • 4Jelinek F,Mercer R L. Interpolated estimation of Markov source parameters from sparse data. In:Proc. of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands: North-Holland, May 1980,381-397.
  • 5Magerman D M. Natural Language Parrsing as Statistical Pattern Recognition:[PhD Thesis]. Stanford University, 1994.
  • 6Bahl L R,Brown P F, De Souza P V, Mercer R L. A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1989, 37(7): 1001-1008.
  • 7Rosenfeld R. Adaptive Statistical Language Modeling: A Maximum Entropy Approach: [PhD thesis]. Carnegie Mellon University, 1994- CMU Technical Report CMU-CS-94-138.
  • 8Darroch J, RatclifI D. Generalized iterative scaling for log-linear models. The annals of Mathematical statistics 1972, 43: 1470-1480.
  • 9Berger A L. Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. Computational Linguistics 1996,22(1) : 39-71.
  • 10RosenIeld R. Two decades oI Statistical Language Modeling: Where Do We Go From Here? Proceedings of the IEEE, 2000, 88(8).

共引文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部