期刊文献+

政府网站开放公文主题分类自动标注方法 被引量:4

Automatic Annotation Method for Subject Classification of Policy Texts on Government Official Websites
原文传递
导出
摘要 当前,政府从各层面采取了一系列措施推进政务信息公开,已经取得了阶段性成果。实践工作中,政府网站平台发布的开放公文缺少主题分类、标注不一致间题成为阻碍政务信息开放利用的技术瓶颈。如何精准地、一致地对现有政府平台的海量政务公文进行主题分类标注,使其能为深度检索、推荐服务提供支撑,是函待解决的关键问题。在深入调研的基础上,一套自动化的针对政府开放公文的主题分类方法被提出,该方法以CNN-LSTM模型为基础,融合预训练BERT模型的语义特征,能精准的对政府开放公文进行主题分类。模型针对主题分类预测的整体准确度(Accuracy)为63.52%,最佳的F1-value可达到63.59%,为解决政务公文主题分类标注缺失问题提供了可行方案。该方法可以与信息检索、推荐结合,为公众提供更具精准度的政府公文服务。 The government has promoted open information from various levels and has achieved phased results.In practice,the lack of annotation on subject classification and inconsistent annotation of policy text published on government fficial websites have become vital technical bottlenecks,hindering the efficient use of open government information.How to accurately and consistently label the massive amount of government texts on existing government platforms with subject classification so that they can provide support for in-depth retrieval and recommendation services is a critical problem that needs to be solved.Subsequently,this paper proposes an automated annotation method for open policy documents through incorporating the semantic features of external pre-trained BERT models and neural networks CNN-LSTM classifiers,which can accurately classify the subject classification of open text on the government official websites.The model's overall Accuracy for subject classification prediction is 63.52%.The F1 value can reach 63.59%,which provides the feasible solution to automate the annotation problem for the subject classification of open policy texts.The method can be combined with an information retrieval and recommendation system to provide more accurate government open information services to the public.
作者 卢小宾 鲁国轩 杨冠灿 祁天娇 LU Xiaobin;LU Guoxuan;YANG Guancan;QI Tianjiao(School of Information Resource Management,Renmin University of China,Bejing 100872,China)
出处 《档案学通讯》 CSSCI 北大核心 2022年第5期19-27,共9页 Archives Science Bulletin
基金 国家社会科学基金重点项目“新时期产业技术情报分析方法体系研究”(21ATQ008)。
关键词 政策文本 主题分类 预训练BERT模型 标注方法 Policy text Subject classification Pre-trained BERT model Annotation method
  • 相关文献

参考文献19

二级参考文献210

共引文献362

同被引文献134

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部