期刊文献+

基于多策略的短文本信息流会话抽取 被引量:2

Conversation extraction in short text message streams based on multiple strategies
下载PDF
导出
摘要 互联网中存在大量的短文本信息流,需要对其进行会话抽取,将相同主题的内容合并到同一会话中。会话中的内容、时间和用户关系都会对会话抽取的性能产生影响,针对该问题提出了一种基于多策略的会话抽取算法。首先,基于内容、时间和用户关系进行会话分割得到会话片段;然后,利用词向量计算内容语义相似度,并结合时间信息计算会话片段间的相关度,对其进行聚类,实现会话抽取。在三个来源于真实聊天记录的数据集上进行实验的结果表明,本方法优于传统方法,综合F值分别提高了38.5%、15.7%和26.8%。 A large number of short text message streams are existing among the Internet. It is better to extract the conversations of the streams and cluster the messages of the same topic in the same conversation. By analyzing the impact of content,temporal and user connection in short text streams,this paper proposed a multiple strategies based novel conversation extraction method. Firstly,the method segmented the text stream into conversation segments based on content,temporal and user connection. Then,it calculated the semantic similarity based on word vectors,combined the temporal to calculate the relevancy to cluster the candidate conversation segments to complete the conversation extraction. Experimental results on 3 datasets of real chat logs show that this method works better than traditional methods,the average F increases by 38. 5%,15. 7% and26. 8%.
出处 《计算机应用研究》 CSCD 北大核心 2016年第4期997-1002,共6页 Application Research of Computers
基金 国家"863"计划资助项目(2011AA7032030D) 国家社会科学基金资助项目(14BXW028)
关键词 会话抽取 短文本 短文本信息流 词向量 聊天记录 conversation extraction short text message short text message stream word vectors chart log
  • 相关文献

参考文献23

  • 1中国互联网络发展状况统计报告[R].2014.
  • 2Ding Yuxin, Meng Xuejun, Chai Guangren, et al. User identification for instant messages[C] //Proc of International Conference on Neural Information Processing. 2011:11-13.
  • 3Kse C, zyurt O, Ikibas C. A comparison of textual data mining methods for sex identification in chat conversations[C] //Proc of the 4th Asia Conference on Information Retrieval Technology. 2008:638-643.
  • 4Shen Dou, Yang Qiang, Sun Jiantao, et al. Thread detection in dynamic text message streams[C] //Proc of the 29th Annual Internatio-nal ACMSIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2006:35-42.
  • 5夏云庆,黄锦辉,张普.中文网络聊天语言的奇异性与动态性研究[J].中文信息学报,2007,21(3):83-91. 被引量:8
  • 6Zhang Yaoshun, Zhag Zhiyong. MapReduce-based approach on short text conversation clustering[J] . Journal of Computational Information Systems, 2014, 10(8):3511-3521.
  • 7黄九鸣,吴泉源,刘春阳,张旭,贾焰,周斌.短文本信息流的无监督会话抽取技术[J].软件学报,2012,23(4):735-747. 被引量:19
  • 8田野,王文东,饶京海,王冠,郭亮,陈灿峰,马建.短信息的会话检测及组织[J].软件学报,2012,23(10):2586-2599. 被引量:3
  • 9Wang Hao, Lu Zhengdong, Li Hang, et al. A dataset for research on short-text conversation[C] //Proc of Conference on Empirical Methods in Natural Language Processing. 2013:935-945.
  • 10Dasclu M, Dobre C, Trusan-Matu, et al. Beyond traditional NLP:a distributed solution for optimizing chat processing[C] //Proc of the 10th International Symposium on Parallel and Distributed Computing. 2011:133-138.

二级参考文献51

共引文献46

同被引文献8

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部