摘要
面对大量繁杂的即时通信数据,司法取证人员很难快速从中找到与案件相关的数据.本文提出一种基于PLSA(probability latent semantic analysis)算法的即时通信取证方法,即利用PLSA算法进行主题挖掘,快速获取与案件相关的可疑数据.通过建立自定义词库和动态调整词库中词项的矢量权重,提高PLSA算法主题挖掘的准确性,对聊天会话中主题的矢量值进行可视化.实验结果表明,该方法的准确率,召回率及F1值比单纯用PLSA算法都有提高.
Because of amounts of miscellaneous instant message(IM)data,the data related to the case can't be found quickly by judicial forensic.A method of IM forensics based PLSA(probability latent semantic analysis)algorithm was presented in this paper.Using PLSA algorithm,the topic was mined to get suspicious crime-related data rapidly.By creating custom thesaurus and adjusting the weight vector of term dynamically,we can improve the accuracy of PLSA algorithm in topic mining and visualize the vector value of topic.The experiments showed the method is feasibility and accuracy.
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2016年第2期122-126,共5页
Journal of Wuhan University:Natural Science Edition
基金
国家自然科学基金资助项目(60903220)
郑州市科技攻关项目(10PTGG341-5)
关键词
即时通信
取证
主题挖掘
PLSA算法
矢量权重
instant message
forensics
topic mining
PLSA(probability latent semantic analysis)algorithm
vector weight