期刊文献+
共找到228篇文章
< 1 2 12 >
每页显示 20 50 100
Using LSA and text segmentation to improve automatic Chinese dialogue text summarization 被引量:3
1
作者 LIU Chuan-han WANG Yong-cheng +1 位作者 ZHENG Fei LIU De-rong 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2007年第1期79-87,共9页
Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all questio... Automatic Chinese text summarization for dialogue style is a relatively new research area. In this paper, Latent Semantic Analysis (LSA) is first used to extract semantic knowledge from a given document, all question paragraphs are identified, an automatic text segmentation approach analogous to Text'filing is exploited to improve the precision of correlating question paragraphs and answer paragraphs, and finally some "important" sentences are extracted from the generic content and the question-answer pairs to generate a complete summary. Experimental results showed that our approach is highly efficient and improves significantly the coherence of the summary while not compromising informativeness. 展开更多
关键词 automatic text summarization Latent semantic analysis (LSA) Text segmentation Dialogue style COHERENCE Question-answer pairs
下载PDF
Study on controllability of semantic accessibility scale from the internet-based system of automatic text summarization and evaluation 被引量:2
2
作者 DU Jia-li YU Ping-fang +1 位作者 ZHAO Hong-yan XU Jing 《通讯和计算机(中英文版)》 2008年第9期54-60,共7页
关键词 通信技术 计算机技术 控制方法 自动化系统
下载PDF
Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods
3
作者 Jésus Antonio Motta Laurence Capus Nicole Tourigny 《Journal of Intelligent Learning Systems and Applications》 2011年第3期131-138,共8页
The vast availability of information sources has created a need for research on automatic summarization. Current methods perform either by extraction or abstraction. The extraction methods are interesting, because the... The vast availability of information sources has created a need for research on automatic summarization. Current methods perform either by extraction or abstraction. The extraction methods are interesting, because they are robust and independent of the language used. An extractive summary is obtained by selecting sentences of the original source based on information content. This selection can be automated using a classification function induced by a machine learning algorithm. This function classifies sentences into two groups: important or non-important. The important sentences then form the summary. But, the efficiency of this function directly depends on the used training set to induce it. This paper proposes an original way of optimizing this training set by inserting lexemes obtained from ontological knowledge bases. The training set optimized is reinforced by ontological knowledge. An experiment with four machine learning algorithms was made to validate this proposition. The improvement achieved is clearly significant for each of these algorithms. 展开更多
关键词 automatic summarization ONTOLOGY MACHINE Learning Extraction Method
下载PDF
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
4
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 multi-docuMENT summarization Machine Learning Decision Trees ADABOOST C4.5 MEDICAL Document summarization
下载PDF
Constructing a taxonomy to support multi-document summarization of dissertation abstracts
5
作者 KHOO Christopher S.G. GOH Dion H. 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1258-1267,共10页
This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level an... This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts. 展开更多
关键词 Text summarization automatic multi-document summarization Variable-based framework Digital library
下载PDF
Density peaks clustering based integrate framework for multi-document summarization 被引量:2
6
作者 BaoyanWang Jian Zhang +1 位作者 Yi Liu Yuexian Zou 《CAAI Transactions on Intelligence Technology》 2017年第1期26-30,共5页
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met... We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10]. 展开更多
关键词 multi-document summarization Integrated score framework Density peaks clustering Sentences rank
下载PDF
Unsupervised Graph-Based Tibetan Multi-Document Summarization
7
作者 Xiaodong Yan Yiqin Wang +3 位作者 Wei Song Xiaobing Zhao A.Run Yang Yanxing 《Computers, Materials & Continua》 SCIE EI 2022年第10期1769-1781,共13页
Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good res... Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in Tibetan.What’s more,there is no large-scale annotated corpus for text summarization.The lack of dataset severely limits the development of low-resource text summarization.In this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data.In this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic.Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough.In terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering.Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents.Then model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is extracted.In order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments.The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods. 展开更多
关键词 multi-document summarization text clustering topic feature fusion graphic model
下载PDF
Research on multi-document summarization based on latent semantic indexing
8
作者 秦兵 刘挺 +1 位作者 张宇 李生 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第1期91-94,共4页
A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decompos... A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features, and then the sentence similarity is computed. The sentences are clustered according to similarity of sentences. The centroid sentences are selected from each class. Finally, the selected sentences are ordered to generate the summarization. The evaluation and results are presented, which prove that the proposed methods are efficient. 展开更多
关键词 multi-document summarization LSI (latent semantic indexing) CLUSTERING
下载PDF
TWO-STAGE SENTENCE SELECTION APPROACH FOR MULTI-DOCUMENT SUMMARIZATION
9
作者 Zhang Shu Zhao Tiejun Zheng Dequan Zhao Hua 《Journal of Electronics(China)》 2008年第4期562-567,共6页
Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summar... Compared with the traditional method of adding sentences to get summary in multi-document summarization,a two-stage sentence selection approach based on deleting sentences in acandidate sentence set to generate summary is proposed,which has two stages,the acquisition of acandidate sentence set and the optimum selection of sentence.At the first stage,the candidate sentenceset is obtained by redundancy-based sentence selection approach.At the second stage,optimum se-lection of sentences is proposed to delete sentences in the candidate sentence set according to itscontribution to the whole set until getting the appointed summary length.With a test corpus,theROUGE value of summaries gotten by the proposed approach proves its validity,compared with thetraditional method of sentence selection.The influence of the token chosen in the two-stage sentenceselection approach on the quality of the generated summaries is analyzed. 展开更多
关键词 TWO-STAGE Sentence selection approach multi-document summarization
下载PDF
Multi-Document Summarization Model Based on Integer Linear Programming
10
作者 Rasim Alguliev Ramiz Aliguliyev Makrufa Hajirahimova 《Intelligent Control and Automation》 2010年第2期105-111,共7页
This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main c... This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming problem. An advantage of this model is that it can cover the main content of source (s) and provide less redundancy in the generated sum- maries. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, have been used the similarity of sentences to the original document and the similarity between sentences. Performance evaluation is conducted by comparing summarization outputs with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms the related methods. 展开更多
关键词 multi-docuMENT summarization Content COVERAGE LESS REDUNDANCY INTEGER Linear Programming
下载PDF
Weakly Supervised Abstractive Summarization with Enhancing Factual Consistency for Chinese Complaint Reports
11
作者 Ren Tao Chen Shuang 《Computers, Materials & Continua》 SCIE EI 2023年第6期6201-6217,共17页
A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore... A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports. 展开更多
关键词 automatic summarization abstractive summarization weakly supervised training entity recognition
下载PDF
Support Vector Machine Based Handwritten Hindi Character Recognition and Summarization
12
作者 Sunil Dhankhar Mukesh Kumar Gupta +3 位作者 Fida Hussain Memon Surbhi Bhatia Pankaj Dadheech Arwa Mashat 《Computer Systems Science & Engineering》 SCIE EI 2022年第10期397-412,共16页
In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English l... In today’s digital era,the text may be in form of images.This research aims to deal with the problem by recognizing such text and utilizing the support vector machine(SVM).A lot of work has been done on the English language for handwritten character recognition but very less work on the under-resourced Hindi language.A method is developed for identifying Hindi language characters that use morphology,edge detection,histograms of oriented gradients(HOG),and SVM classes for summary creation.SVM rank employs the summary to extract essential phrases based on paragraph position,phrase position,numerical data,inverted comma,sentence length,and keywords features.The primary goal of the SVM optimization function is to reduce the number of features by eliminating unnecessary and redundant features.The second goal is to maintain or improve the classification system’s performance.The experiment included news articles from various genres,such as Bollywood,politics,and sports.The proposed method’s accuracy for Hindi character recognition is 96.97%,which is good compared with baseline approaches,and system-generated summaries are compared to human summaries.The evaluated results show a precision of 72%at a compression ratio of 50%and a precision of 60%at a compression ratio of 25%,in comparison to state-of-the-art methods,this is a decent result. 展开更多
关键词 Support vector machine(SVM) optimization PRECISION Hindi character recognition optical character recognition(OCR) automatic summarization and compression ratio
下载PDF
基于分层表示和上下文增强的类摘要生成技术 被引量:2
13
作者 陈豪伶 虞慧群 +2 位作者 范贵生 李明辰 黄子杰 《计算机研究与发展》 EI CSCD 北大核心 2024年第2期307-323,共17页
代码摘要是源代码的自然语言解释,高质量的代码摘要有助于提高开发人员程序理解效率.近年来,代码自动摘要的研究集中在为方法粒度的代码片段生成摘要.然而,对于面向对象的语言,例如Java,类才是项目的基本组成单元.基于上述问题,提出一... 代码摘要是源代码的自然语言解释,高质量的代码摘要有助于提高开发人员程序理解效率.近年来,代码自动摘要的研究集中在为方法粒度的代码片段生成摘要.然而,对于面向对象的语言,例如Java,类才是项目的基本组成单元.基于上述问题,提出一种基于分层表示和上下文增强的类摘要生成方法HRCE(hierarchical representation and context enhancement),并构建了一个包含358 992个?Java类,上下文,摘要?数据对的类摘要数据集.HRCE使用代码精简策略去除类的非关键代码,从而缩短代码长度.然后,对类的层次结构,包括类签名、属性和方法分别进行建模,获得类的语义信息和层次结构信息.此外,从项目中抽取父类的签名及摘要来刻画类在项目中依赖的上下文.实验表明,基于分层表示和上下文增强的生成模型能够表征代码的语义和层次结构,并可以从目标类的内部和外部获取信息. HRCE在BLEU,METEOR,ROUGE-L等评估指标上超过了所有基准模型. 展开更多
关键词 代码自动摘要 分层表示 上下文增强 深度学习 类摘要
下载PDF
融合多模态信息的产品摘要抽取模型
14
作者 赵强 王中卿 王红玲 《计算机应用》 CSCD 北大核心 2024年第1期73-78,共6页
在网络购物平台上,简洁、真实、有效的产品摘要对于提升购物体验至关重要。网上购物无法接触到产品实物,产品图像所含信息是除产品文本描述外的重要视觉信息,因此融合包括产品文本和产品图像在内的多模态信息的产品摘要对于网络购物具... 在网络购物平台上,简洁、真实、有效的产品摘要对于提升购物体验至关重要。网上购物无法接触到产品实物,产品图像所含信息是除产品文本描述外的重要视觉信息,因此融合包括产品文本和产品图像在内的多模态信息的产品摘要对于网络购物具有重要的意义。针对融合产品文本描述和产品图像的问题,提出一种融合多模态信息的产品摘要抽取模型。与一般的产品摘要任务的输入只包含产品文本描述不同,该模型引入了产品图像作为一种额外的信息来源,使抽取产生的摘要更丰富。具体来说,首先对产品文本描述和产品图像分别使用预训练模型进行特征表示,从产品文本描述中提取每个句子的文本特征表示,从产品图像中提取产品整体的视觉特征表示;然后使用基于低阶张量的多模态融合方法将每个句子的文本特征和整体视觉特征进行模态融合,得到每个句子的多模态特征表示;最后将所有句子的多模态特征表示输入摘要生成器中以生成最终的产品摘要。在CEPSUM(Chinese E-commerce Product SUMmarization)2.0数据集上进行对比实验,在CEPSUM 2.0的3个数据子集上,该模型的平均ROUGE-1比TextRank高3.12个百分点,比BERTSUMExt(BERT SUMmarization Extractive)高1.75个百分点。实验结果表明,该模型融合产品文本和图像信息对于产品摘要是有效的,在ROUGE评价指标上表现良好。 展开更多
关键词 产品摘要 多模态摘要 抽取式摘要 多模态融合 自动文摘
下载PDF
面向司法文书的抽取-生成式自动摘要模型
15
作者 陈炫言 安娜 +1 位作者 孙宇 周炼赤 《计算机工程与设计》 北大核心 2024年第4期1117-1125,共9页
为解决抽取式摘要核心信息拼接生硬,生成式摘要源文本过长易忽略重要信息等问题,对抽取式摘要和生成式摘要的结合进行研究。通过分析抽取式摘要可提取出文本关键信息且缩短源文本长度特性;生成式摘要可降低序列间信息损失,增加文本关联... 为解决抽取式摘要核心信息拼接生硬,生成式摘要源文本过长易忽略重要信息等问题,对抽取式摘要和生成式摘要的结合进行研究。通过分析抽取式摘要可提取出文本关键信息且缩短源文本长度特性;生成式摘要可降低序列间信息损失,增加文本关联的优势。提出一种面向司法文书的抽取-生成式自动摘要模型,融合模型优势,避免单一模型存在的关键文本信息重复及重组段落语法不准的问题,保障法律文书抽取的切实完整性。在大规模公开法律领域裁判文书数据集上的实验结果表明,该模型获得较高ROUGE得分,表明了该模型提升了摘要质量。 展开更多
关键词 自动摘要 抽取式 生成式 算法融合 裁判文书 法律领域 完整连贯性
下载PDF
基于深度学习的生成式文本摘要综述 被引量:1
16
作者 陈明轩 肖诗斌 王洪俊 《软件导刊》 2024年第5期212-220,共9页
随着互联网飞速发展,文本数据呈现指数级增长,为文档管理、文本分类、信息检索等文本处理任务带来了前所未有的挑战。研究人员虽然开发了各种基于深度学习(DL)的生成式摘要(ATS)模型,但大部分最先进的ATS模型均基于DL架构,基于DL的生成... 随着互联网飞速发展,文本数据呈现指数级增长,为文档管理、文本分类、信息检索等文本处理任务带来了前所未有的挑战。研究人员虽然开发了各种基于深度学习(DL)的生成式摘要(ATS)模型,但大部分最先进的ATS模型均基于DL架构,基于DL的生成式文本摘要领域仍缺乏全面的文献调查。为此,提供了一份基于DL的ATS的全面调查。首先概述了ATS的概念,然后总结了基于DL的ATS的典型模型及其面临的主要问题、处理方法,最后强调ATS任务中的一些开放性挑战,以及当下的热点、难点问题和未来研究趋势,以期帮助研究人员更好地了解该领域的最新进展。 展开更多
关键词 自动文本摘要 深度学习 生成式摘要 自然语言处理 自然语言生成
下载PDF
基于文本摘要的无监督关键词抽取方法
17
作者 尤泽顺 周喜 +2 位作者 董瑞 张洋宁 杨奉毅 《计算机工程与设计》 北大核心 2024年第9期2779-2784,共6页
为克服基于嵌入的关键词抽取方法在长文档上性能下降的问题,提出一种基于文本摘要的方法(summarization-based document embedding rank,SDERank)。将句向量的加权和作为文档嵌入,根据每个句子与文档主题的语义相关度赋予权重。以往基... 为克服基于嵌入的关键词抽取方法在长文档上性能下降的问题,提出一种基于文本摘要的方法(summarization-based document embedding rank,SDERank)。将句向量的加权和作为文档嵌入,根据每个句子与文档主题的语义相关度赋予权重。以往基于嵌入的方法选择关键词时忽略候选词之间的关联,针对该问题,在SDERank的改进版SDERank+中,PageRank算法被用于提取候选词之间的共现权重作为相似度分数的修正。实验结果表明,在4个广泛使用的数据集上SDERank和SDERank+比之前最好的模型MDERank的F1分数平均高出2.2%和3.29%。 展开更多
关键词 自动关键词抽取 文本摘要 长文档建模 文档主题分析 语义处理 权重优化 向量相似性
下载PDF
AIGC驱动古籍自动摘要研究:从自然语言理解到生成
18
作者 吴娜 刘畅 +1 位作者 刘江峰 王东波 《图书馆论坛》 CSSCI 北大核心 2024年第9期111-123,共13页
作为自然语言处理中的关键任务,旨在压缩长文本信息、解决文本信息过载问题。文章以《二十四史》中的人物列传语料为例,从抽取式和生成式方法出发,探索AIGC技术驱动下古籍文本自动摘要应用的可行路径,为古籍资源的创造性转化和创新性发... 作为自然语言处理中的关键任务,旨在压缩长文本信息、解决文本信息过载问题。文章以《二十四史》中的人物列传语料为例,从抽取式和生成式方法出发,探索AIGC技术驱动下古籍文本自动摘要应用的可行路径,为古籍资源的创造性转化和创新性发展提供参考,助力数字人文理念下的古籍内容价值实现。首先基于GujiBERT、SikuBERT、BERT-ancient-Chinese模型进行语义表征,并使用LexRank算法进行重要性排序以抽取摘要。然后利用GPT-3.5-turbo、GPT-4和ChatGLM3模型生成摘要,并构建ChatGLM3和GPT-3.5-turbo微调模型。最后采用信息覆盖率和信息多样性指标对抽取式摘要结果进行评测,采用rouge和mauve指标对生成式摘要结果进行评测。研究表明:SikuBERT在抽取式摘要任务中对古文的语义表征能力和理解能力较强;通用大语言模型在古籍领域的自动摘要能力各有特色,但主旨提炼能力有所欠缺;通过小样本数据集微调GPT-3.5-turbo和ChatGLM3模型能有效提升模型的摘要生成能力。 展开更多
关键词 古籍价值再造 自动摘要 SikuBERT 大语言模型
下载PDF
基于改动树检索的拉取请求描述生成方法
19
作者 蒋竞 刘子豪 +1 位作者 张莉 汪亮 《软件学报》 EI CSCD 北大核心 2024年第11期5065-5082,共18页
随着开源人工智能系统规模的扩大,软件的开发与维护也变得困难.GitHub是开源社区最重要的开源项目托管平台之一,通过GitHub提供的拉取请求系统,开发者可以方便地参与到开源项目的开发.拉取请求的描述可以帮助项目核心团队理解拉取请求... 随着开源人工智能系统规模的扩大,软件的开发与维护也变得困难.GitHub是开源社区最重要的开源项目托管平台之一,通过GitHub提供的拉取请求系统,开发者可以方便地参与到开源项目的开发.拉取请求的描述可以帮助项目核心团队理解拉取请求的内容和开发者的意图,促进拉取请求被接受.当前,存在可观比例的开发者没有为拉取请求提供描述,既增加了核心团队的工作负担,也不利于项目日后的维护工作.提出一种自动为拉取请求生成描述的方法PRSim.所提方法提取拉取请求包含的提交说明、注释更新和代码改动等特征,建立语法改动树,使用树结构自编码器编码以检索代码改动相似的其他拉取请求,参照相似拉取请求的描述,使用编码器-解码器网络概括提交说明和注释更新,生成新拉取请求的描述.实验结果表明,PRSim的生成效果在Rouge-1、Rouge-2和Rouge-L这3个指标的F1分数上分别达到36.47%、27.69%和35.37%,与现有方法LeadCM相比分别提升了34.3%、75.2%和55.3%,与方法Attn+PG+RL相比分别提升了16.2%、22.9%和16.8%,与方法PRHAN相比分别提升了23.5%、72.0%和24.8%. 展开更多
关键词 拉取请求 语法改动树 相似度计算 自动摘要 开源社区
下载PDF
基于二阶段对比学习的中文自动文本摘要方法研究
20
作者 杨子健 郭卫斌 《华东理工大学学报(自然科学版)》 CAS CSCD 北大核心 2024年第4期586-593,共8页
在中文自动文本摘要中,暴露偏差是一个常见的现象。由于中文文本自动摘要在序列到序列模型训练时解码器每一个词输入都来自真实样本,但是在测试时当前输入用的却是上一个词的输出,导致预测词在训练和测试时是从不同的分布中推断出来的,... 在中文自动文本摘要中,暴露偏差是一个常见的现象。由于中文文本自动摘要在序列到序列模型训练时解码器每一个词输入都来自真实样本,但是在测试时当前输入用的却是上一个词的输出,导致预测词在训练和测试时是从不同的分布中推断出来的,而这种不一致将导致训练模型和测试模型直接的差异。本文提出了一个两阶段对比学习框架以实现面向中文文本的生成式摘要训练,同时从摘要模型的训练以及摘要评价的建模进行对比学习。在大规模中文短文本摘要数据集(LCSTS)以及自然语言处理与中文计算会议的文本数据集(NLPCC)上的实验结果表明,相比于基线模型,本文方法可以获得更高的面向召回率的摘要评价方法(ROUGE)指标,并能更好地解决暴露偏差问题。 展开更多
关键词 中文自动文本摘要 对比学习 暴露偏差 预处理模型 ROUGE指标
下载PDF
上一页 1 2 12 下一页 到第
使用帮助 返回顶部