摘要
作者识别是根据已知文本推断未知文本作者的交叉学科.其传统研究通常基于文学或语言学的经验知识,而现代研究则主要依靠数学方法量化作者的写作风格.近些年,随着认知科学、系统科学和信息技术的发展,作者识别受到越来越多研究者的关注.本文主要站在计算语言学的角度综述作者识别领域现代研究中的方法和思路.首先,简要介绍了作者识别的发展历程.然后,详述了文体风格特征、作者识别方法以及该领域中多层面的研究.接着介绍了与作者识别相关的一些评测、数据集及评价指标.最后,指出该领域存在的一些问题,结合这些问题分析并展望了作者识别的发展趋势.
Authorship identification is an interdisciplinary subject of inferring the author of unknown texts based on the known texts.The traditional research of authorship identification is generally based on the empirical know-ledge of literature or linguistics,while the modern research mostly relies on mathematical methods to quantify the author's writing style.In recent years,with the development of cognitive science,system science and information technology,more and more researchers pay attention to authorship identification.This paper mainly reviews the methods and ideas in modern research in the field of authorship identification from the perspective of computation-al linguistics.First,the development history of authorship identification is introduced briefly.Then,the stylometry,authorship identification methods and multi-faceted research in this realm are expounded.Next,some evaluations,data sets and evaluation metrics related to authorship identification are explicated.Finally,some problems in this domain are pointed out,while the development trend of authorship identification is analyzed and forecasted com-bined with these problems.
作者
张洋
江铭虎
ZHANG Yang;JIANG Ming-Hu(Lab of Computational Linguistics,School of Humanities,Tsinghua University,Beijing 100084)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2021年第11期2501-2520,共20页
Acta Automatica Sinica
基金
国家自然科学基金(62036001)资助。
关键词
作者识别
文体学
写作风格
评价指标
Authorship identification
stylometry
writing style
evaluation metrics