摘要
针对民事裁判文书区别于新闻文本的文本结构和重要信息分布的特点,基于BERT提出了一种结合粗粒度和细粒度抽取方法的结构化民事裁判文书摘要生成方法。首先通过粗粒度抽取方法对裁判文书进行重要的模块信息抽取,以保留文本结构;然后采用基于BERT的序列标注方法构建细粒度的抽取式摘要模型,从句子级别对重要模块的信息进行进一步抽取,以构建最终摘要。实验表明,相比于单一的粗粒度抽取或者细粒度抽取,本文方法均获得了更好的摘要生成性能。
Aiming at the text structure and important information distribution features of civil judgment documents that are different from news texts,this paper proposes a structured civil judgment document abstract generation method based on BERT(Bidirectional Encoder Representation from Transformers),combining coarse-grained and fine-grained extraction methods.Firstly,important module information is extracted from the judgment documents by the coarse-grained extraction method to preserve the text structure.Then the BERT-based sequence labeling method is used to build a fine-grained extractive abstract model.Information of important modules is further extracted based on the sentence level,so to construct the final abstract.Experiments show that the proposed method has better abstract generation performance than single coarsegrained extraction or fine-grained extraction.
作者
魏鑫炀
唐向红
WEI Xinyang;TANG Xianghong(College of Computer Science and Technology,Guizhou University,Guiyang 550025,China;Guizhou Provincial Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China)
出处
《软件工程》
2022年第5期1-4,共4页
Software Engineering
关键词
司法领域
裁判文书
抽取式文本摘要
序列标注
judicial field
judgment documents
extractive text abstract
sequence annotation