期刊文献+

基于今日头条数据的作者身份识别方法研究

下载PDF
导出
摘要 在大数据时代,随着机器创作者越来越有创意,今后媒体上将会产生越来越多的机器创作者自动生产的内容。在复杂的新闻报道、文学作品、用户评论中有效识别人类作者的作品以及机器作者的作品非常重要。所以本文提出作者身份识别方法,通过深入观察和分析,发现机器作者在词汇特征、句法特征、语义特征和发布设备等4个方面存在显著差异,并对这4个维度的特征进行深入分析,进行特征选择,使用筛选出来的特征构建作者身份识别模型。 In the era of big data,as machine creators become more and more creative,more and more content automatically produced by machine creators will be produced in the media in the future.It is important to effectively identify the works of human authors and the works of machine writers in complex news reports,literary works,and user reviews.Therefore,this research proposes the authorship identif ication method.Through in-depth observation and analysis,it is found that the machine author has signif icant dif ferences in lexical features,syntactic features, semantic features and publishing equipment.Therefore,the characteristics of these four dimensions are analyzed in depth.Feature selection,using the f iltered features to construct an author identif ication model.
作者 李开元
出处 《科技创新导报》 2019年第11期146-147,149,共3页 Science and Technology Innovation Herald
基金 大连外国语大学学生创新创业训练计划项目(项目编号:201810172026)的资助
关键词 作者身份识别 机器用户 大数据 Authorship identif ication Machine user Big data
  • 相关文献

参考文献3

二级参考文献51

  • 1武晓春,黄萱菁,吴立德.基于语义分析的作者身份识别方法研究[J].中文信息学报,2006,20(6):61-68. 被引量:25
  • 2孙晓明,马少平.基于写作风格的作者识别[C]//中国中文信息学会第五届全国会员代表大会暨成立二十周年学术会议论文集.北京:清华大学出版社,2001.
  • 3Efron B, Thisted R. Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? [ J ]. Biometrika, 1976, 63(3) :435 -447.
  • 4De Vel O, Anderson A, Corney M, et al. Mining E - mail Content for Author Identification Forensics [ J]. ACM S1GMOD Record, 2001,30(4) :55 -64.
  • 5Zheng R, Li J, Huang Z, et al. A Framework for Authorship Identi- fication of Online Messages: Writing - style Features and Classifi- cation Techniques[ J ]. Journal of the American Society for Informa- tion Science and Technology,2006,57 ( 3 ) : 378 - 393.
  • 6Abbasi A, Chen H. Identification and Comparison of Extremist - group Web Forum Messages Using Authorship Analysis [ J ]. IEEE Intelligent Systems,2005,20 ( 5 ) : 67 - 75.
  • 7Holmes D I,Forsyth R S. The Federalist Revisited:New Directions in Authorship Attribution [ J ]. Literary and Linguistic Computing, 1995,10(2) :111 - 127.
  • 8Juola P, Baayen H. A Controlled Corpus Experiment in Authorship Identification by Cross -entropy[ J]. Literary and Linguistic Com- puting,2005,20(S) :59 -67.
  • 9Abbasi A, Chen H. Writeprints:A Stylometric Approach to Identity -level Identification and Similarity Detection in Cyberspace [ J ]. ACM Transactions on Information Systems ,2008,26 (2) :1 -29.
  • 10Salton G, Buckley C. Term - weighting Approaches in Automatic Text Retrieval [ J ]. Information Processing and Management, 1988,24 (5) :513 -523.

共引文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部