期刊文献+

短信息的会话检测及组织 被引量:3

Conversation Detection and Organization of Mobile Text Messages
下载PDF
导出
摘要 如何挖掘存储在手机上的大量短信息背后所隐含的会话信息,是一个非常具有挑战性的问题,因为它们并不具备"主题"、"回复"等经常被用于邮件线索分析的元数据.基于此,提出了一种基于时间聚类算法和话题检测的短信息会话识别模型.首先,根据短信息流的时间分布特性,将会话双方的所有短信息划分到一个一个的候选会话中,进而运用基于latent Dirichlet allocation(LDA)训练出来的语义话题模型,对候选会话进行更深层次的分析;利用该话题模型度量了各个候选会话在话题上的相关度.最后,在综合时间和话题相关度的基础上,通过对候选会话的合并识别出隐含的会话信息.通过对包含了50名大学生在6个月中产生的122 359条短信进行实验验证,证明了该算法的有效性. Mining the latent conversations which are implied in the big amount of text messages stored on one’s mobile phone,is a challenging problem.They can hardly be organized by threads,due to lack of necessary metadata such as "subject" and "reply-to".This paper proposes an innovative conversation recognition model based on temporal clustering algorithms and topic detection methods.The study first clusters the text messages into candidate conversations based on their temporal attributes,and then does further analysis using a semantic model based on latent Dirichlet allocation(LDA).In the end,the text messages are organized as conversations based on their integrated correlation of temporal relevancy and topic relevancy.This approach is evaluated with a real dataset,which contain 122 359 text messages collected from 50 university students during 6 months.
出处 《软件学报》 EI CSCD 北大核心 2012年第10期2586-2599,共14页 Journal of Software
基金 国家重点基础研究发展计划(973)(2009CB320504) 国家高技术研究发展计划(863)(2011AA01A101)
关键词 短信息 时间聚类 话题 LATENT DIRICHLET ALLOCATION text message temporal clustering topic latent Dirichlet allocation
  • 相关文献

参考文献14

  • 1Bollegala D, Matsuo Y, Ishizuka M. Measuring semantic similarity between words using Web search engines. In: Proc. of the 16th Int'l Conf. on World Wide Web (WWW 2007). New York: ACM Press, 2007. 757-766. [doi: 10.1145/1242572.1242675].
  • 2Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 2000,63(2):411-423. [doi: 10.1111/1467-9868.00293].
  • 3Graham A, Garcia-Molina H, Paepcke A, Winograd T. Time as the essence for photo browsing through personal digital libraries. In Proc. of the 2nd ACM/IEEE-CS Joint Conf. on Digital Libraries (JCDL 2002). New York: ACM Press, 2002. 326-335. [doi: 10. 1145/544220.544301].
  • 4Metzler D, Dumais S, Meek C. Similarity measures for short segments of text. In: Amati G, Carpineto C, Romano G, eds. Proc. of the 29th European Conf. on IR Research (ECIR 2007). Berlin, Heidelberg: Springer-Verlag, 2007. 16-27.
  • 5Cooper M, Foote J, Girgensohn A, Wilcox L. Temporal event clustering for digital photo collections. ACM Trans. on Multimedia Computing, Communications, and Applications (TOMCCAP), 2005,1 (3):269-288. [doi: 10.1145/1083314.1083317].
  • 6Wang L, Jia Y, Han WH. Instant message clustering based on extended vector space model. In: Proc. of the 2nd Int'l Syrup. on Advances in Computation and Intelligence (ISICA 2007). LNCS 4683, Berlin, Heidelberg: Springer-Verlag, 2007. 435-443. [doi: 10.1007/978-3-540-74581-5_48].
  • 7Chang TH, Lee CH. Topic segmentation for short texts. In: Proc. of the 17th Pacific Asia Conf. on Language, Information and Computation. Singapore, 2003. 159-165. http://aclweb.org/anthology-new/Y/YO3/YO3-1018.pdf.
  • 8Kleinberg J. Bursty and hierarchical structure in streams. Journal of Data Mining and Knowledge Discovery, 2003,7(4):373-397. [doi: 10.1023/A: 1024940629314].
  • 9Sun HJ, Wang SR, Jiang QS. FCM-Based model selection algorithms for determining the number of cluster. Pattern Recognition, 2004,37(10):2027-2037. [doi: 10.1016/j.patcog.2004.03.012].
  • 10Phan X-H, Nguyen LM, Horiguchi S. Learning to classify short and sparse text & Web with hidden topics from large-scale data collections. In: Proc. of the 17th Int'l Conf. on World Wide Web (WWW 2008). New York: ACM Press, 2008. 91-100. [doi: 10.1145/1367497.1367510].

二级参考文献4

共引文献81

同被引文献48

  • 1夏云庆,黄锦辉,张普.中文网络聊天语言的奇异性与动态性研究[J].中文信息学报,2007,21(3):83-91. 被引量:8
  • 2中国互联网络发展状况统计报告[R].2014.
  • 3Yang J,Leskovec J.Modeling information diffusion in implicit networks[C]//IEEE 10th International Conference on Data Mining,2010:599-608.
  • 4ZHOU Xueyan,YANG Jing.A BBS opinion leader mining algorithm based on topic model[J].Journal of Computational Information Systesms,2014,10(6):2571-2578.
  • 5Kim YS,Tran VL.Assessing the ripple effects of online opinion leaders with trust and distrust metrics[J].Expert Systems with Applications,2013,40(9):3500-3511.
  • 6Yu X,Wei X,Lin X.Algorithms of BBS opinion leader mining based on sentiment analysis[G].LNCS 6318:Web Information Systems and Mining,2010:360-369.
  • 7Quan XJ,Liu G,Lu Z.Short text similarity based on probabilistic topics[J].Knowledge and Information Systems,2010,25(3):473-491.
  • 8Bu Z,Xia Z.A sock puppet detection algorithm on virtual spaces[J].Knowledge Based Syst,2013,37:366-377.
  • 9ERMAN J, MAHANTI A, ARL1TY M. Qrp05-4 : Internet traffic identi- fication using machine learning[ C ]//Proc of Global Telecommunica- tions Conference. [ S. 1. ] : IEEE Press ,2006 : 1-6.
  • 10ERMAN J, ARLIrI M, MAHANT! A. Traffic classification using clus- tering algorithms[ C]//Proc of SIGCOMM Workshop on Mining Net- work Data. [ S. 1. ] :ACM Press,2006:281-286.

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部