期刊文献+

基于深度学习的我国科技政策属性识别 被引量:1

Identification of China's S&T policy properties based on deep learning
原文传递
导出
摘要 当前基于深度学习算法的文本分析更多聚焦于微博、评论和新闻头条为代表的舆情监测和情感分析等短文本信息处理,而针对各类政策文本、论文和专利全文的属性识别和长文本分类等相关研究较少,存在一定拓展空间。与传统的机器学习模型相比,深度学习在自然语言处理和文本特征提取方面具有显著优势,其可通过预训练语言模型降低特征工程的人工干预,从而在政策属性和政策工具识别等领域具有较好的应用前景。本文针对我国科技政策属性(引导型、强制型和鼓励型)的自动识别问题,导入当前流行的几种深度学习模型进行了对比分析。与此同时,本文还针对政策文本的取词长度、数据增强和文本信息量估算等关联计算问题也进行了理论解析,从而进一步丰富了深度学习模型在科学计量,尤其是科技政策文本分析领域的应用。理论和实证分析结果显示,经过基于EDA(Easy Data Augmentation)方法的文本数据增强之后,当前几种代表性的深度学习模型在面向较为抽象的科技政策属性识别问题上均显著提升了处理能力,其中EDA+Bi-LSTM-Attention的识别准确率超过88%,其他参与实验的深度学习模型(TextCNN、Bi-LSTM、RCNN、CapsNet和FastText等)在文本增强之后的平均识别率也超过了80%;但是,文本取词长度从500词增加到2000词对中文科技政策属性识别的效果提升不显著。本文的研究对于科技政策属性自动识别、中文长文本分类和政策工具识别等科技管理相关量化分析具有一定的启示意义和参考价值。 The current text analysis based on deep learning algorithm focuses more on short text information processing such as public opinion monitoring and sentiment analysis represented by microblog,online comments and news headlines and so on,while there are few related research on property identification and long text classification of various policy texts,paper full-text and patent full-text,which has significant room for exploration and expansion.Compared with traditional machine learning models,the relevant models or algorithms on deep learning have significant advantages in NLP(natural language processing)and text feature extraction.Deep learning algorithms can reduce manual intervention in feature engineering through pre-training language models,and thus has a promising application prospect in such fields as policy attribute or property identification and policy-instrument recognition.This paper aims at the automatic identification of the properties of science and technology policies,and the properties of policy are divided into such types as guiding,compulsory and encouraging.The main approach is to import several popular models of deep learning for comparative analysis.At the same time,this paper also carried out theoretical analysis on related computing problems such as(1)the impact on property identification among the different text length of policies;(2)the impact of data augmentation of text data;(3)and facilitate the information estimation of policy texts.In order to further enrich the application of deep learning model in scientometrics and informetrics,especially in the field of text analysis on science and technology policies,the experiments on property identification of science and technology policies form China local governments were conducted based on those selected models of deep learning,which are very popular in the latest studies on text classification.The theoretical and empirical analysis showed that the current representative deep learning models have significantly enhanced their processing capacity for property identification of science and technology policy after manipulation of data augmentation based on the EDA(Easy Data Augmentation)method that just presents the excellent performance in the English text processing in the relevant studies.The identifying accuracy of EDA+bi-LSTM-Attention was more than 88%,and the average recognition accuracy of the other deep learning models(TextCNN,Bi-LSTM,RCNN,CapsNet and FastText,etc.)also can reach over 80%after text augmentation based on the EDA method.However,increasing the length of text interception from 500 words to 2000 words has no significant effect on the property-identification of Chinese science and technology policy,and these experimental results could also be useful for the following studies on policy-text analyses because it implied the full-text of policy could be unnecessary in the similar task of long-text processing.The research of this paper has certain significance of enlightenment and reference value for the quantitative analysis of science and technology management,such as automatic identification of science and technology policy attributes,classification of Chinese long text and identification of policy tools.Meanwhile,the output of this paper could be controversial for the limited policy-text,and in another data source of policy-text,e.g.energy policies,environment policies and financial policies and so on,whether those mentioned models of deep learning in this paper are still effective,should be further explored and discussed the future work.
作者 李牧南 王良 赖华鹏 Li Munan;Wang Liang;Lai Huapeng(School of Business Administration,South China University of Technology,Guangzhou 510641,Guangdong,China;Guangdong Key Lab on Innovation Methods&Decision Management System,Guangzhou 510641,Guangdong,China)
出处 《科研管理》 CSSCI CSCD 北大核心 2024年第2期1-11,共11页 Science Research Management
基金 国家自然科学基金面上项目:“基于多源数据融合与机器学习的新兴技术风险挖掘研究”(72074081,2021.01—2024.12) 国家社会科学基金重点项目:“加快我国科技自立自强发展战略问题研究”(22AZD035,2022.05—2025.12)。
关键词 深度学习 科技政策 属性识别 数据增强 文本分类 deep learning science and technology policy property identification data augmentation text classification
  • 相关文献

参考文献21

二级参考文献286

共引文献540

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部