摘要
电子取证数据中聊天内容的数据量最大,针对内容的研判分析是重点和难点。通过采用模板、语义分析、HMM-Viterbi模型对内容提取重要信息,并采用计算文本特征值和深度学习计算语义距离挖掘涉案关键词,并通过TextRank算法提取内容关键词和自动摘要,从而能够快速掌握大量的聊天内容中的主要内容关键信息,提高工作效率。
In the field of digital forensics,files of chat history have the largest data scale,where the difficulty and prio- rity is to analyze the content of those messages. Different templates, semantic analysis and HMM-Viterbi model were employed to extract the key ideas of texts. Meanwhile, utilization of eigenvalues of texts combined with deep learning in calculating semantic similarities was used to dig out the keywords of cases. Additionally, TextRank algorithm aids in drawing the key-words and auto abstract of individual message files. All three strategies together accelerate the process of understanding main ideas and key messages delivered by large scale of data, so highly improving the working efficiency.
作者
曾超
刘晓宇
林艺滨
温若辉
ZENG Chao LIU Xiao-yu LIN Yi-bin WEN Ruo-hui(Xiamen Meiya Pico Information Co. ,Ltd. , Xiamen 361008, China Cyber Security Department, Beijing 100006, China)
出处
《计算机科学》
CSCD
北大核心
2016年第B12期228-230,共3页
Computer Science