期刊文献+

A Method to Build a Super Small but Practically Accurate Language Model for Handheld Devices 被引量:2

原文传递
导出
摘要 In this paper, an important question, whether a small language model can be practically accurate enough, is raised. Afterwards, the purpose of a language model, the problems that a language model faces, and the factors that affect the performance of a language model,are analyzed. Finally, a novel method for language model compression is proposed, which makes the large language model usable for applications in handheld devices, such as mobiles, smart phones, personal digital assistants (PDAs), and handheld personal computers (HPCs). In the proposed language model compression method, three aspects are included. First, the language model parameters are analyzed and a criterion based on the importance measure of n-grams is used to determine which n-grams should be kept and which removed. Second, a piecewise linear warping method is proposed to be used to compress the uni-gram count values in the full language model. And third, a rank-based quantization method is adopted to quantize the bi-gram probability values. Experiments show that by using this compression method the language model can be reduced dramatically to only about 1M bytes while the performance almost does not decrease. This provides good evidence that a language model compressed by means of a well-designed compression technique is practically accurate enough, and it makes the language model usable in handheld devices.
作者 吴根清 郑方
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2003年第6期747-755,共9页 计算机科学技术学报(英文版)
  • 相关文献

参考文献1

二级参考文献6

  • 1Zheng F,Euro Speech’99 Budapest Sept,1999年,2卷,819页
  • 2Zheng F,IEEE International Conf Acoust., Speech and Signal Processing (ICASSP),1999年,II-601-604页
  • 3Mou X L,5th National Conference on Man-Machine Speech Communication (NCMMSC-98)(in Ch,1998年,206页
  • 4Zheng F,International Symposium on Chinese Spoken Language Processing (ISCSLP’98), Si,1998年,49页
  • 5Zheng F,sib NationalConference on Man-Machine Speech Communication (NCMMSC-98)(in Chi,1998年,280页
  • 6Zheng F,dissertation,1997年

共引文献1

同被引文献12

  • 1黄永文,何中市.基于互信息的统计语言模型平滑技术[J].中文信息学报,2005,19(4):46-51. 被引量:8
  • 2张仰森,曹元大,俞士汶.语言模型复杂度度量与汉语熵的估算[J].小型微型计算机系统,2006,27(10):1931-1934. 被引量:7
  • 3MACKENZIE I S,SOUKOREFF R W.A characterlevel error analysis technique for evaluating text entry methods[C]//Proceedings of the Second Nordic Conference on Human-Computer Interaction-NordiCHI 2002.New York:ACM,2002.
  • 4MACKENZIE I S,SOUKOREFF R W.Text Entry for Mobile Computing:Models and Methods,Theory and Practice[J].In Human-Computer Interaction,2002,17 (2):147 -198.
  • 5SOUKOREFF R W,MACKENZIE I S.Input-based language modelling in the design of high performance text input techniques[C]//Proceedings of Graphics Interface 2003,Halifax in Canada:[s.n.],2003.89-96.
  • 6SOUKOREFF R W,MACKENZIE I S.Recent developments in text-entry error rate measurement[C]//Extended Abstracts of the ACM conference on Human Factors in Computing Systems-CHI 2004.New York,USA:ACM Press,1425-1428.
  • 7LIU Bing-quan,WANG Xiao-long.An approach to machine learning of Chinese Pinyin-to-character conversion for small-memory application[C]//IEEE Proceedings of International Conference on Machine Learning and Cybernetics.Beijing,China:[s.n.],2002:1287-1291.
  • 8BERNARDO J M,SMITHE A F M.Bayesian theory[M].New York:John Wiley & Sons Press,1996.
  • 9FORNEY G D.The viterbi algorithm[J].Proceedings of the IEEE,1973,61(3):268 -278.
  • 10Christopher D.Manning,Hinrich Schutze.统计自然语言处理基础[M].北京:电子工业出版社,2005.82-83.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部