摘要
对政府工作报告大数据的智能分析,可以快速且充分地掌握其内在各因素的关联,支持决策者完成合理的决断。本文以2000年后共18年的全国31个省和直辖市县区级及以上的政府工作报告为分析对象,首先提出了基于文档嵌入同胚空间的最佳嵌入维度分析框架,在此基础上对政府工作报告的最佳文档嵌入维度同胚模型进行了省域文档可分性和省域文档相似性研究,最后给出了政府工作报告的文档聚类可分性和相似性差异的实验结果和分析。该模型得到的最佳文档嵌入向量能够有效地对政府工作报告的文档省域子空间进行划分,各地方政府工作报告的文档省域时间序列相似性差异凸显了它们在政治、经济、教育、文化等多方面的差距,实验同时发现在求解相似文档集上使用正则化后的政府文档向量的欧式距离能够等效于传统的余弦距离。本文提出的文档嵌入同胚分析框架不仅对我国智慧政务的建设具有一定的参考意义和应用价值,同时可以在上市公司公告等文档多分类任务中对深度信息挖掘、报告再解读和智能决策提供支持。
The intelligent analysis of big data in government work reports can quickly and fully grasp the correlation of various internal factors and support decision maker to complete reasonable judgment. Taking the 18-year government work report as research target, which consists of 31 provinces and municipalities at the county level and above after 2000, this paper first proposes the optimal embedding dimension analysis framework based on document embedding homeomorphic space. And then on this basis, the paper studies the separability and similarity of provincial documents of the best document embedded dimension homeomorphism model of government work report. Finally, the experimental results and analysis of the document clustering separability and similarity difference in the government work report are given. The optimal document embedding vector obtained by the model can effectively divide the provincial subspace of the government work report, and the differences in the similarity of the provincial time series of the local government work report highlight their gaps in politics, economy, education, culture and so on. The experiment also found that the Euclidean distance using the regularized government document vector is equivalent to the traditional cosine distance. The document embedding analysis framework not only has certain reference significance and application value for the construction of smart government affairs in China, but also can support deep information mining, report reinterpretation and intelligent decision making in multi-category documents such as public company announcements.
出处
《计算机科学与应用》
2020年第6期1194-1208,共15页
Computer Science and Application
关键词
政府工作报告
文档嵌入
同胚
维度
Government Work Report
Document Embedding
Homeomorphism
Dimension