期刊文献+

基于对话内容的交互型文本会话主题挖掘 被引量:1

Session topic mining for interactive text based on conversational content
下载PDF
导出
摘要 传统的主题挖掘模型一般仅从交互型文本中挖掘出文档主题,为了能够从中挖掘出会话主题并提高挖掘模型的普适性,提出了一种基于对话内容的交互型文本会话主题生成模型。首先通过分析交互型文本的特征,基于主题树的概念,定义了一个5层结构的对话生成树。以此为基础,再基于LDA构建会话主题生成模型(ST-LDA)。最后采用吉布斯抽样法对ST-LDA进行推导,得到会话主题及其分布概率。使用实际数据进行验证,结果表明,ST-LDA模型可以从交互型文本中有效地挖掘出会话主题。此外,成果可以降低分类算法的复杂度,回溯主题—参与者关联关系,具有较好的普适性。 Traditional theme mining model generally digs out the document theme from the interactive text only. In order to explore the session topic and improve the universality of mining model, a kind of interactive text session topic generation model based on the content of the dialogue was put forward. Firstly, by analyzing the characteristics of interactive text and based on the concept of topic tree, a dialog spanning tree was defined with a five-layer structure. Based on this and LDA, the model of session topic generation(ST-LDA) was built. At last, Gibbs sampling method was adopted to deduce the ST-LDA and obtaining session topic and its distribution probability. The results show that the ST-LDA model can dig out a session topic effectively from the interactive text. Besides, the results can reduce the complexity of the classification algorithm and can be back to the theme —participants association. It also has a good universality.
出处 《电信科学》 北大核心 2016年第9期139-145,共7页 Telecommunications Science
基金 国家自然科学基金资助项目(No.61163005) 江西省科技计划基金资助项目(No.2014ZBBE50008)~~
关键词 交互型文本 对话内容 会话主题挖掘 对话生成树 LDA interactive text conversation content session topic mining dialog spanning tree latent Dirichlet allocation
  • 相关文献

参考文献7

二级参考文献81

  • 1王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用,2009,29(3):843-845. 被引量:36
  • 2樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 3Lewis D. D.. An evaluation of phrasal and clustered representalions on a text categorization task. In: Proceedings of SIGIR'92,the 15st ACM International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, 37-50.
  • 4Sebastiani F,. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 5Lewis D.. Naive bayes at forty: The independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998,4-15.
  • 6Salton G.. Automatic Text Processing: The Transformation,Analysis, and Retrieval of Information by Computer. Reading,MA: Addison Wesley, 1989.
  • 7Mitchell T. M.. Machine Learning. New York: McCraw Hill,1996.
  • 8Joachims T.. Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning,Chemnitz, Germany, 1998, 137-142.
  • 9Yang Y. , Liu X.. A Re-examination of text categorization methods. In: Proceedings of SIGIR'99, the 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, 42-49.
  • 10樊兴华.因果推理和文本分类.清华大学博士后出站报告,2004.

共引文献370

同被引文献8

引证文献1

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部