摘要
目前政府信息公开主要依据《中华人民共和国政府信息公开条例》,但站在用户视角,需要根据不同的使用场景进行适配,因此对公文进行自动化标引具有重要意义。本文基于自然语言处理技术,通过词频、词性和词义的实验和分析,提炼公文标题中的范式,对国务院1969—2018年的4 388条公文进行自动化标引。其中以地域关键词和行业关键词为例进行标引,标引后提炼相关关键词可以供相关渠道进行搜索和二次加工。本文主要处理标题的标引,尚未对全文进行标引。
At present,government information disclosure is mainly based on the“Regulations on the Openness of Government Information of the People’s Republic of China”,but from the perspective of users,it needs to be adapted according to different usage scenarios.Therefore,it is of great significance to automate indexing of official documents.Based on natural language processing technology,this paper refines the paradigm in the official document title through the experiment and analysis of word frequency,part of speech and word meaning,and automatically indexes 4 388 official documents of the State Council from 1969 to 2018.In the case of regional keywords and industry keywords as an example,the relevant keywords can be searched and secondary processed after indexing.This article mainly deals with the indexing of the title,and the full text has not been indexed.
作者
江华丽
曹祺
陈刚
JIANG HuaLi;CAO Qi;CHEN Gang(School of Cyber Science and Engineering,Wuhan University,Wuhan 430072,China;Greysh Group Co.,Ltd.,Beijing 100080,China)
出处
《数字图书馆论坛》
CSSCI
2019年第1期43-49,共7页
Digital Library Forum
关键词
政府信息公开
文本挖掘
自动化标引
Government Information Disclosure
Text Mining
Automated Indexing