期刊文献+

基于深度学习的中文微博作者身份识别研究 被引量:5

Research on author identity recognition of Chinese microblog based on deep learning
下载PDF
导出
摘要 作者身份识别一直在公安行业和文检工作中起着重要的作用。现有的作者语言风格建模过程繁琐、文本特征工程没有普适性。针对此问题,在无须专家进行特征建模的情况下,提出CABLSTM中文微博作者身份识别模型,并在公开微博语料集测试该模型准确度。该模型为最大化提取短文本特征,融合attention机制于CNN中并去除池化层,通过双向LSTM以获取上下文相关信息,身份识别结果通过softmax层进行输出。实验结果表明,该模型在进行中文微博作者身份识别任务中与传统机器学习算法、Text CNN和LSTM算法相对比,在准确率、召回率、F值方面都有一定的提升。 Author identification always plays an important role in the public security and literary inspection work.Texts feature extraction is cumbersome and not universal.To solve this problem,this paper proposed the CABLSTM Chinese microblog author identification model without expert feature modeling,and tested the accuracy of the model in the open microblog corpus.This model maximized the extraction of short text features,fused the attention mechanism in the CNN and removed the pooling layer,and obtained context-related information through the bidirectional LSTM.The identity recognition result was output through the softmax layer.Experimental results show that the model has a certain improvement in accuracy,recall rate,and F-measure in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.
作者 徐晓霖 蔡满春 芦天亮 Xu Xiaolin;Cai Manchun;Lu Tianliang(School of Information Technology&Network Security,People’s Public Security University of China,Beijing 102623,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第1期16-18,25,共4页 Application Research of Computers
基金 国家重点研发计划重点专项资助项目(2017YFB0802804) 国家自然科学基金资助项目(61602489) 中国人民公安大学2018年基本科研业务费科研机构项目(2018JKF504).
关键词 作者身份识别 长短时记忆网络 卷积神经网络 特征自动提取 author identification LSTM CNN automatic feature extraction
  • 相关文献

参考文献2

二级参考文献15

  • 1李贤平.《红楼梦》成书新说[J].复旦学报(社会科学版),1987,29(5):3-16. 被引量:65
  • 2Stamatatos E. A survey of modern authorship attribution methods [ J ]. Journal of the American Society for Information Science and Technology, 2009, 60 ( 3 ) : 538-556.
  • 3Goebel R,Wahlster W. Using dependency-based annotations for authorship identification [ C ]//Text, Speech and Dialogue. Berlin: Springer, 2012: 314-319.
  • 4Mendenhall T C. The characteristic curves of composition [J]. Science, 1887 (214S): 237-246.
  • 5Yule G U. On sentence-length as a statistical characteristic of style in prose: With application to two cases of disputed authorship [ J]. Biometrika, 1939: 363-390.
  • 6Baayen H, Van Halteren H, Tweedie F. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution [ J ]. Literary and Linguistic Computing, 1996, 11(3): 121-132.
  • 7Zhao Y, Zobel J. Effective and Scalable Authorship Attribution using Function Words [ M ]//Information Retrieval Technology. Berlin : Springer, 2005 : 174-189.
  • 8Gamon M. Linguistic correlates of style: authorship classification with deep linguistic analysis features [ C ]// Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 2004 : 611-617.
  • 9Abbasi A,Chen H. Applying authorship analysis to extremist- group web forum messages [ J ]. IEEE Intelligent Systems, 2005, 20 (5) : 67-75.
  • 10Zhang C, Wu X, Niu Z, et al. Authorship identification from unstructured texts[ J]. Knowledge-Based Systems, 2014:99-111.

共引文献16

同被引文献7

引证文献5

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部