摘要
[目的/意义]为缓解政策信息过载、提高政策阅读效率及提升政策作用发挥效果,对政策文本的核心信息进行汇总凝练并生成高质量摘要。[方法/过程]集成无监督模型和算法,提出基于句向量改进的政策文本关键句子抽取策略;将依存句法结构融合至政策文本摘要生成中,提取政策文本依存句法树及其依存句法特征,增强基于RoBERTa模型的政策文本表示效果;在基于Seq2Seq的政策文本摘要生成模型中,引入PGN模型和改进SIMCLS模型筛选出最佳候选摘要,提升模型性能与所生成摘要的质量。[结果/结论]针对国务院政策文本的摘要生成实验表明,研究构建的融合关键句子和依存句法的政策文本摘要模型与策略,在ROUGE指标的评价上显著优于其他模型,且从实例呈现上看,模型所生成摘要在语义和语言质量上均表征良好。但政策文本摘要生成的连贯性有待提升,用于学习训练的、适用的参考摘要较少,摘要生成的评价评估有待进一步完善。
[Method/process]The model integrates unsupervised models and algorithms to propose a key sentence extraction strategy for policy text based on the improvement of sentence vectors;integrates dependent syntactic structure into policy text summary generation,extracts the dependent syntactic tree of policy text and its features,in order to enhance the effect of policy text representation based on the RoBERTa model;besides,introduces the PGN model and improves SIMCLS model to filter the best candidate summaries in Seq2Seq-based policy text summary generation model,which improves the performance of the model and the quality of the summaries generated.[Result/conclusion]The summary generation experiments for the policy texts of the State Council show that the policy text summary model and strategy proposed in this paper,which integrates key phrases and dependent syntax,significantly outperforms the other models in the evaluation of ROUGE indexes,and the summaries generated by the model are well characterised in terms of semantics and linguistic quality from the examples presented.The coherence of policy text summary generation needs to be improved,there are fewer applicable reference summaries for learning and training,and the evaluation and assessment of summary generation needs to be further improved.
作者
胡吉明
杨云
Hu Jiming;Yang Yun(School of Information Management,Wuhan University,Hubei Wuhan 430072;Institute of Data Intelligence,Wuhan University,Hubei Wuhan 430072;Department of Management Sciences,National Natural Science Foundation of China,Beijing 100085)
出处
《情报理论与实践》
CSSCI
北大核心
2024年第11期177-185,共9页
Information Studies:Theory & Application
基金
国家社会科学基金一般项目“基于结构功能的政策文本摘要生成研究”的成果,项目编号:23BTQ081。
关键词
政策文本
摘要生成
关键句抽取
依存句法
policy text
summary generation
key sentence extraction
dependency syntax