摘要
伴随信息技术在日常生活中的普及,互联网短文本作为电子数据证据的案例越来越多.国际上针对此类问题的研究已经很多,并积累了一定的成熟经验.然而,由于中文自身的特点和复杂性,西方国家主要以英文为应用场景的研究成果在中文场景下并不能很好地适用,因此研究适合于中文应用场景的短文本消息作者归属算法具有一定的现实意义.基于 N -gram模型,利用似然比(likelihood ratio, LR)方法,通过词频的分布特征来确定短文本的作者归属.实验结果表明,该方法取得了比较好的归属效果.
With the popularization of information technology in daily life, there are more and more cases of short Internet texts as electronic evidence data. International research on such issues has been comparatively rich and accumulated some mature experience. However, due to the characteristics and complexity of Chinese language, the research results of western countries, which mainly take English as the application scene, are not very applicable to Chinese scene. Therefore, it is of practical significance to focus on the author attribution algorithm of short text messages which are suitable for Chinese application scenarios. Based on the N -gram model and the likelihood ratio method, this paper determines the author attribution of short text through the distribution feature of word frequency. The experimental results show that this method has achieved a better attribution effect.
作者
李孟林
Li Menglin(Cyber Crime Investigation Department,Criminal Investigation Police University of China,Shenyang 110854)
出处
《信息安全研究》
2019年第9期843-846,共4页
Journal of Information Security Research