期刊文献+

国务院政府工作报告(1954—2017)文本挖掘及社会变迁研究 被引量:23

Text Mining on the Government Work Reports of the State Council(1954-2017) and Social Transformation Research
下载PDF
导出
摘要 国务院政府工作报告是一类具有施政纲领性质的综合政策性文本。采用文本挖掘技术对历年工作报告进行多粒度、多层次的综合定量分析,对快速理解领域内容的发展变化以及发现社会变迁规律具有重要的指导意义。首先,利用中文文本分词工具并结合构建的三个词典对工作报告进行文本预处理。然后,一方面根据工作报告中的词统计信息,分别对频繁词、热词和新词进行概念上的界定并提出三种相应的特征筛选方法,基于新词提出了社会活力的计算方法,并对特征词时间序列进行聚类分析。另一方面根据文档信息,对1954—2017年整个时间段进行阶段划分,并结合特征词时间序列聚类结果进行特征词时间序列模式发现。最后,研究结果表明,抽取出的频繁词、热词以及新词能够反映出工作报告中探讨的共性问题、热点问题及其演化规律以及历年社会活力波动情况;根据特征词时间序列聚类结果和整个时间阶段合理的划分结果,发现了历年来国务院工作报告中存在的9种特征词时间序列模式。 The government work report of the State Council is a comprehensive policy text. This paper uses text mining technology to carry out a comprehensive multi-granularity, multi-level quantitative analysis of the government work reports. This has immense practical significance for acquiring an understanding of the evolution of domain content and the laws of social transformation discovery to relevant personnel. Firstly, a series of text preprocessing is done by using the Chinese word segmentation tool combined with three kinds of dictionaries created by us: the domain dictionary, the stop word dictionary, and the thesaurus dictionary. Then, frequent words, hot words, and new words are redefined and three kinds of corresponding feature mining methods are proposed. A quantitative calculation method for social vitality is proposed based on these new words, and then clustering analysis is conducted for feature words represented by a time series with a popular clustering method. According to the document information of the government work reports, we divide the time period from 1954 to 2017 into different stages, using which we conduct the pattern discovery for feature words combined with the feature words clustering results. Finally, our findings show that the selected frequent words, hot words, and new words in the government work reports can indicate the common problems, the hot issues and its evolution pattern, and the changes in social activity over the years. From the term clustering results and the reasonable time stages of the whole period, we get nine specific patterns of feature words.
作者 魏伟 郭崇慧 陈静锋 Wei Wei;Guo Chonghui;Chen Jingfeng(Institute of Systems Engineering,Dalian University of Technology,Dalian 116024)
出处 《情报学报》 CSSCI CSCD 北大核心 2018年第4期406-421,共16页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金面上项目"电子病例挖掘中的聚类模型与算法研究"(71771034)
关键词 国务院政府工作报告 文本挖掘 社会变迁 模式发现 社会活力 government work report of the State Council text mining social transformation pattern discovery social vitality
  • 相关文献

参考文献20

二级参考文献569

共引文献909

同被引文献463

引证文献23

二级引证文献126

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部