摘要
在大数据时代,随着机器创作者越来越有创意,今后媒体上将会产生越来越多的机器创作者自动生产的内容。在复杂的新闻报道、文学作品、用户评论中有效识别人类作者的作品以及机器作者的作品非常重要。所以本文提出作者身份识别方法,通过深入观察和分析,发现机器作者在词汇特征、句法特征、语义特征和发布设备等4个方面存在显著差异,并对这4个维度的特征进行深入分析,进行特征选择,使用筛选出来的特征构建作者身份识别模型。
In the era of big data,as machine creators become more and more creative,more and more content automatically produced by machine creators will be produced in the media in the future.It is important to effectively identify the works of human authors and the works of machine writers in complex news reports,literary works,and user reviews.Therefore,this research proposes the authorship identif ication method.Through in-depth observation and analysis,it is found that the machine author has signif icant dif ferences in lexical features,syntactic features, semantic features and publishing equipment.Therefore,the characteristics of these four dimensions are analyzed in depth.Feature selection,using the f iltered features to construct an author identif ication model.
出处
《科技创新导报》
2019年第11期146-147,149,共3页
Science and Technology Innovation Herald
基金
大连外国语大学学生创新创业训练计划项目(项目编号:201810172026)的资助
关键词
作者身份识别
机器用户
大数据
Authorship identif ication
Machine user
Big data