摘要
政策文本分类是一项涉及自然语言处理(NLP)、机器学习、政策解析等多领域的综合性技术,在政策管理、研究以及信息服务等方面有重要应用。首先,针对目前政策文本领域公共资源较少的问题,提出结合领域知识和NLP构建政策文本分类数据集的半自动化方法,构建了句子级自然资源政策文本分类数据集;其次,挖掘政策文本自身特点,提出基于深度学习的标题信息自适应增强政策文本分类方法,并在现有主流深度学习模型上进行扩展应用;最后,在自然资源政策文本分类数据集上的实验表明,应用该方法后,5个常用深度学习分类模型的准确率获得了3%以上提升,宏平均F_(1)值获得了5%以上提升。
Policy text classification is a comprehensive technology involving natural language processing(NLP),machine learning,policy analysis and other fields,which can be applied to policy management,research,information service,etc.Firstly,aiming at the problem that there are few public datasets in the field of policy text at present,a semi-automatic method of combining domain knowledge and NLP to construct policy text classification dataset is proposed,and a sentence-level natural resource policy text classification dataset is constructed.Secondly,taking advantage of the characteristics of policy texts,a deep learning-based title adaptive enhancement policy text classification method is proposed,which is applied to the existing mainstream deep learning models.Finally,extensive experiments on the natural resource policy text classification dataset show that after adding this method,the accuracy of five commonly used deep learning classification models is improved by more than 3%,and the macro-average F_(1) score is improved by more than 5%.
作者
胡容波
郭诚
王锦浩
方金云
HU Rongbo;GUO Cheng;WANG Jinhao;FANG Jinyun(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Information Center of Ministry of Natural Resources,Beijing 100036;University of Chinese Academy of Sciences,Beijing 100190)
出处
《高技术通讯》
CAS
2023年第7期692-703,共12页
Chinese High Technology Letters
基金
北京科技攻关(A201908230146)
河北省重点研发计划(20310106D)资助项目。
关键词
政策文本
文本分类
深度学习
自然资源
延迟决策
数据集构建
policy text
text classification
deep learning
natural resources
delay decision
dataset construction