摘要
[目的/意义]政策工具的识别与分析是政策研究的重要手段之一。此项工作目前多以人工开展。本文运用深度学习方法进行政策工具的自动识别,以期提高政策工具识别的效率。[方法/过程]设计与实施政策数据采集与清洗——政策工具人工标引——模型训练——结果解读的政策工具自动识别的实验流程,并以北上广贵四地的政府信息公开政策为例,对比传统机器学习方法和深度学习方法在政策工具识别任务上的性能表现。此外,提出整合政策全局信息进行各段落政策工具识别的方案,并通过实验证明方案的有效性。[结果/结论]深度学习模型CNN在全量测试数据上达到76.51%的准确率,整合全局信息的CNN模型达到77.13%的准确率。而仅对模型的高置信度结果进行评估发现,整合全局信息的CNN模型在其中55.63%的测试数据上准确率达到了95.44%。该准确率已经达到了实用的要求,表明超过一半的政策工具标引可以借用模型的高置信度结果,无需人工复核。基于深度学习方法研究政策工具的自动识别取得较好的效果,提升政策工具标引的效率,为大数据量的政策工具自动识别提供正面经验。
[Purpose/significance]The identification and analysis of policy tools is one of the important methods of policy research.However,the identification of policy tools is mostly manual.In this article,we attempt to use deep learning methods to automatically identify policy tools,aiming at improving the efficiency of policy tool identification.[Method/process]We designed and implemented the policy tool automatic identification experimental process of"Policy data collection and cleaning-policy tool manual indexing-model training-result interpretation".We take the open government data policies of Beijing,Shanghai,Guangzhou,and Guiyang as an example to compare the performance of traditional machine learning methods and deep learning methods on the task of identifying policy tools.In addition,we have proposed to integrate policy global information to identify policy tools in each paragraph,and our experiments have proved the effectiveness of the idea.[Result/conclusion]The deep learning model CNN achieves an accuracy of 76.51%on the full test data,and the CNN model that integrates global information achieves an accuracy of 77.13%.When evaluating the high-confident results of the model,we find that the model achieves an accuracy of 95.44%on 55.63%of the test data,which has reached the practical requirements.This shows that more than half of the data can be indexed with the model’s high-confidence results without manual review.Deep learning methods have been applied to the automatic identification of policy tools and has achieved good results.It could help to improve the efficiency of policy tool labeling and provide positive experience for the automatic identification of policy tools with big data.And it provides a positive experience for automatic identification of policy tools with large data volumes.
作者
李娜
姜恩波
朱一真
刘婷
Li Na;Jiang Enbo;Zhu Yizhen;Liu Ting(Chengdu Library and Information Center,Chinese Academy of Sciences,Chengdu 610041;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190)
出处
《图书情报工作》
CSSCI
北大核心
2021年第7期115-122,共8页
Library and Information Service