摘要
政策文本的量化研究近年来受到了政策研究学者的广泛关注,其研究结论以客观数据为依据,在很大程度上可以克服以往对政策定性分析的主观性和随机性。已有定量政策文本分析方法主要存在两方面的不足:一方面,对于政策文本的采集主要依靠手工收集,其数据规模较小;另一方面,在政策识别方面主要依靠人类经验,在小规模数据集上进行偏置归纳。针对以上问题,该文提出基于预训练语言模型的政策识别方法,从而克服以上问题,在较大规模的政策文本数据集上取得了较好的效果。
Quantitative study on policy text is attractive in that the conclusions obtained by quantitative approaches can overcome the subjectivity and randomness of qualitative analysis approaches. Existing quantitative approaches on policy text analysis have two drawbacks. First, the data size is small due to the manually collecting of policy text. Second, the identification of policy text mainly depends on the human experience, which is obtained on biased induction on small data. To address the above issues, this paper proposed a pretrained language model approach for policy identification so that to overcome the above problems and achieve well performance on large-scale policy data set.
作者
朱娜娜
王航
张家乐
孙英巍
ZHU Nana;WANG Hang;ZHANG Jiale;SUN Yingwei(School of Information Management,Heilongjiang University,Harbin,Heilongjiang 150080,China;Faculty of Computing,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China;Harbin University Library,Harbin,Heilongjiang 150086,China;Party School of Harbin Bureau Group Company,Harbin,Heilongjiang 150001,China)
出处
《中文信息学报》
CSCD
北大核心
2022年第2期104-110,共7页
Journal of Chinese Information Processing
基金
国家社会科学基金(15ATQ008)
黑龙江省文化厅艺术科学规划项目(2019C027)。
关键词
预训练
语言模型
政策识别
pretraining
language model
policy identification