摘要
In this paper,we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes.Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision,recall and F1 evaluation.In our experiments,we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation.Character recognition rate was improved from 85.2% to 91%.We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.
In this paper,we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes.Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision,recall and F1 evaluation.In our experiments,we induced semantic classes from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation.Character recognition rate was improved from 85.2% to 91%.We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.
基金
supported by the National Natural Science Foundation of China under Grant Nos. 10925419,90920302,10874203,60875014,61072124,11074275,11161140319.