This paper describes a corpus-based investigation of dialogue acts.In particular,it attempts to answer questions about the empirical distribution of dialogue acts in a publicly accessible corpus of transcribed convers...This paper describes a corpus-based investigation of dialogue acts.In particular,it attempts to answer questions about the empirical distribution of dialogue acts in a publicly accessible corpus of transcribed conversations and degrees of lexical similarity of dialogue acts.The research will not only lend itself to a deeper understanding of the lexical realisation of dialogue acts as communicative functions but also contribute towards dialogue act recognition.The Switchboard Dialogue Act Corpus is adopted and the SWBD-DAMSL tags used for the experiments.Two statistical measures were applied,which include lexical dispersion and lexical similarity using Chi by degrees of freedom(CBDF).We show that the distribution of DAs is uneven,that a majority of the word types occur in one DA and that only a small fraction of word types are commonly used by all the DAs.Our hierarchical clustering of DAs based on CBDF suggests that there is an intrinsic structure of DAs pertaining to fact finding and verification,information provision and inter-speaker agreement and affinity building.The paper concludes with discussions and suggestions for future work.展开更多
基金supported in part by grants received from the Research Grants Council of the Hong Kong Special Administrative Region,China(RGC Project No.142711)City University of Hong Kong(Project Nos9041694,9610188,7008062and7008002)
文摘This paper describes a corpus-based investigation of dialogue acts.In particular,it attempts to answer questions about the empirical distribution of dialogue acts in a publicly accessible corpus of transcribed conversations and degrees of lexical similarity of dialogue acts.The research will not only lend itself to a deeper understanding of the lexical realisation of dialogue acts as communicative functions but also contribute towards dialogue act recognition.The Switchboard Dialogue Act Corpus is adopted and the SWBD-DAMSL tags used for the experiments.Two statistical measures were applied,which include lexical dispersion and lexical similarity using Chi by degrees of freedom(CBDF).We show that the distribution of DAs is uneven,that a majority of the word types occur in one DA and that only a small fraction of word types are commonly used by all the DAs.Our hierarchical clustering of DAs based on CBDF suggests that there is an intrinsic structure of DAs pertaining to fact finding and verification,information provision and inter-speaker agreement and affinity building.The paper concludes with discussions and suggestions for future work.