期刊文献+

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System
原文传递
导出
摘要 In this paper,we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes.Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision,recall and F1 evaluation.In our experiments,we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation.Character recognition rate was improved from 85.2% to 91%.We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system. In this paper,we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes.Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision,recall and F1 evaluation.In our experiments,we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation.Character recognition rate was improved from 85.2% to 91%.We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第2期443-450,共8页 计算机科学技术学报(英文版)
基金 supported by the National Natural Science Foundation of China under Grant Nos. 10925419,90920302,10874203,60875014,61072124,11074275,11161140319.
关键词 semantic class induction lack of domain data language model adaptation semantic class induction,lack of domain data,language model adaptation
  • 相关文献

参考文献15

  • 1Gorin A L. On automated language acquisition. Acoustical Society of America Journal, 1995, 97(6): 3441-3461.
  • 2Arai K, Wright J H, Riccardi G, Gorin A L. Grammar fragment acquisition using syntactic and semantic clustering. Speech Communication, 1999, 27(1): 43-62.
  • 3Meng H M, Siu K C. Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries. IEEE Trans. Knowl. Data Eng., 2002, 14(1): 172- 181.
  • 4Pargellis AN, Fosler-Lussier E, Lee C H, Potamianos A, Tsai A. Auto-induced semantic classes. Speech Communication, 2004, 43(3): 183-203.
  • 5Pangos A, Iosif E, Potamianos A, Fosler-Lussier E. Combining statistical similarity measures for automatic induction of semantic classes. In Proc. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding, San Juan, Puerto Rico, Nov. 27-Dec. 1, 2005, pp.278-283.
  • 6Iosif E, Tegos A, Pangos A, Fosler-Lussier E, Potamianos A. Unsupervised combination of metrics for semantic class induction. In Proc. Spoken Language Technology Workshop, Palm Beach, Aruba, Dec. 10-13, 2006, pp.86-89.
  • 7Iosif E, Potamianos A. A soft-clustering algorithm for automatic induction of semantic classes. In Proc. Interspeech 2007, Antwerp, Belgium, Aug. 27-31, 2007, pp.1609-1612.
  • 8Wang C, Chung G, Seneff S. Automatic induction of language model data for a spoken dialogue system. Language Resources and Evaluation, 2006, 40(1): 25-46.
  • 9Lin D. An information-theoretic definition of similarity. In Proc. the 15th International Conference on Machine Learning, Madison, USA, July 24-27, 1998, pp.296-304.
  • 10Dagan I, Lee L, Pereira F. Similarity-based models of word cooccurrence probabilities. Machine Learning, 1999, 34(1-3): 43-69.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部