Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, p...Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning.展开更多
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ...Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.展开更多
This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use Vi...This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use ViT for UDA Re-ID.We observe that the ViT structure provides a unique advantage for UDA Re-ID,i.e.,it has a prompt(the learnable class token)at its bottom layer,that can be used to efficiently condition the deep model for the underlying domain.To utilize this advantage,we propose a novel two-stage UDA pipeline named Prompting And Tuning(PAT)which consists of a prompt learning stage and a subsequent fine-tuning stage.In the first stage,PAT roughly adapts the model from source to target domain by learning the prompts for two domains,while in the second stage,PAT fine-tunes the entire backbone for further adaption to increase the accuracy.Although these two stages both adopt the pseudo labels for training,we show that they have different data preferences.With these two preferences,prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.展开更多
Code comments are crucial in software engineering, aiding in program maintenance and code reuse. The process of generating clear and descriptive code comments, outlining code functionality, is called code summarizatio...Code comments are crucial in software engineering, aiding in program maintenance and code reuse. The process of generating clear and descriptive code comments, outlining code functionality, is called code summarization. Existing code summarization methods are typically trained using transformer-based models. However, these trained models often possess limited parameters and lack specific training tasks, hindering their ability to capture code semantics effectively. This paper uses a high-capacity pre-trained model, CodeT5, for code summarization. CodeT5 is designed with an encoder-decoder architecture that excels in code summarization tasks. Furthermore, we adopt a novel paradigm, "pre-train, prompt, predict", to unlock the knowledge embedded within CodeT5. We devise a prompt template to convert input code into code prompts and fine-tune CodeT5 with these prompts—a process we term prompt tuning. Our effectiveness experiments demonstrate that prompt tuning CodeT5 with only 40% of the dataset can achieve comparable performance to fine-tuning CodeT5 with 100% of the dataset. This means our approach is applicable in few-shot learning scenarios. Additionally, our prompt learning method is not sensitive to the size of the tuning dataset. Our practicality experiments show that the performance of prompt-tuned CodeT5 far surpasses that of transformer-based models trained on code-comment datasets collected from Stack Overflow.展开更多
Instructional videos are very useful for completing complex daily tasks,which naturally contain abundant clip-narration pairs.Existing works for procedure understanding are keen on pretraining various video-language m...Instructional videos are very useful for completing complex daily tasks,which naturally contain abundant clip-narration pairs.Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then finetuning downstream classifiers and localizers in predetermined category space.These video-language models are proficient at representing short-term actions,basic objects,and their combinations,but they are still far from understanding long-term procedures.In addition,the predetermined procedure category faces the problem of combination disaster and is inherently inapt to unseen procedures.Therefore,we propose a novel compositional prompt learning(CPL)framework to understand long-term procedures by prompting short-term video-language models and reformulating several classical procedure understanding tasks into general video-text matching problems.Specifically,the proposed CPL consists of one visual prompt and three compositional textual prompts(including the action prompt,object prompt,and procedure prompt),which could compositionally distill knowledge from short-term video-language models to facilitate long-term procedure understanding.Besides,the task reformulation enables our CPL to perform well in all zero-shot,few-shot,and fully-supervised settings.Extensive experiments on two widely-used datasets for procedure understanding demonstrate the effectiveness of the proposed approach.展开更多
Recent years have seen the wide application of natural language processing(NLP)models in crucial areas such as finance,medical treatment,and news media,raising concerns about the model robustness and vulnerabilities.W...Recent years have seen the wide application of natural language processing(NLP)models in crucial areas such as finance,medical treatment,and news media,raising concerns about the model robustness and vulnerabilities.We find that prompt paradigm can probe special robust defects of pre-trained language models.Malicious prompt texts are first constructed for inputs and a pre-trained language model can generate adversarial examples for victim models via mask-filling.Experimental results show that prompt paradigm can efficiently generate more diverse adversarial examples besides synonym substitution.Then,we propose a novel robust training approach based on prompt paradigm which incorporates prompt texts as the alternatives to adversarial examples and enhances robustness under a lightweight minimax-style optimization framework.Experiments on three real-world tasks and two deep neural models show that our approach can significantly improve the robustness of models to resist adversarial attacks.展开更多
Inferring the fully qualified names(FQNs)of undeclared receiving objects and non-fully-qualified type names(non-FQNs)in partial code is critical for effectively searching,understanding,and reusing partial code.Existin...Inferring the fully qualified names(FQNs)of undeclared receiving objects and non-fully-qualified type names(non-FQNs)in partial code is critical for effectively searching,understanding,and reusing partial code.Existing type inference tools,such as COSTER and SNR,rely on a symbolic knowledge base and adopt a dictionary-lookup strategy to map simple names of undeclared receiving objects and non-FQNs to FQNs.However,building a symbolic knowledge base requires parsing compilable code files,which limits the collection of APIs and code contexts,resulting in out-of-vocabulary(OOV)failures.To overcome the limitations of a symbolic knowledge base for FQN inference,we implemented Ask Me Any Type(AMAT),a type of inference plugin embedded in web browsers and integrated development environment(IDE).Unlike the dictionary-lookup strategy,AMAT uses a cloze-style fill-in-the-blank strategy for type inference.By treating code as text,AMAT leverages a fine-tuned large language model(LLM)as a neural knowledge base,thereby preventing the need for code compilation.Experimental results show that AMAT outperforms state-of-the-art tools such as COSTER and SNR.In practice,developers can directly reuse partial code by inferring the FQNs of unresolved type names in real time.展开更多
基金Project supported by the National Natural Science Foundation of China(Nos.62306075 and 62101136)the China Postdoctoral Science Foundation(No.2022TQ0069)+2 种基金the Natural Science Foundation of Shanghai,China(No.21ZR1403600)the Shanghai Municipal of Science and Technology Project,China(No.20JC1419500)the Shanghai Center for Brain Science and Brain-Inspired Technology,China。
文摘Prompt learning has attracted broad attention in computer vision since the large pre-trained visionlanguagemodels (VLMs) exploded. Based on the close relationship between vision and language information builtby VLM, prompt learning becomes a crucial technique in many important applications such as artificial intelligencegenerated content (AIGC). In this survey, we provide a progressive and comprehensive review of visual promptlearning as related to AIGC. We begin by introducing VLM, the foundation of visual prompt learning. Then, wereview the vision prompt learning methods and prompt-guided generative models, and discuss how to improve theefficiency of adapting AIGC models to specific downstream tasks. Finally, we provide some promising researchdirections concerning prompt learning.
基金supported by the National Natural Science Foundation of China(Nos.42488201,42172137,42050104,and 42050102)the National Key R&D Program of China(No.2023YFF0804000)Sichuan Provincial Youth Science&Technology Innovative Research Group Fund(No.2022JDTD0004)
文摘Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts.
基金This work was supported by the National Key Research and Development Program of China in the 13th Five-Year(No.2016YFB0801301)in the 14th Five-Year(Nos.2021YFFO602103,2021YFF0602102,and 20210Y1702).
文摘This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use ViT for UDA Re-ID.We observe that the ViT structure provides a unique advantage for UDA Re-ID,i.e.,it has a prompt(the learnable class token)at its bottom layer,that can be used to efficiently condition the deep model for the underlying domain.To utilize this advantage,we propose a novel two-stage UDA pipeline named Prompting And Tuning(PAT)which consists of a prompt learning stage and a subsequent fine-tuning stage.In the first stage,PAT roughly adapts the model from source to target domain by learning the prompts for two domains,while in the second stage,PAT fine-tunes the entire backbone for further adaption to increase the accuracy.Although these two stages both adopt the pseudo labels for training,we show that they have different data preferences.With these two preferences,prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.
文摘Code comments are crucial in software engineering, aiding in program maintenance and code reuse. The process of generating clear and descriptive code comments, outlining code functionality, is called code summarization. Existing code summarization methods are typically trained using transformer-based models. However, these trained models often possess limited parameters and lack specific training tasks, hindering their ability to capture code semantics effectively. This paper uses a high-capacity pre-trained model, CodeT5, for code summarization. CodeT5 is designed with an encoder-decoder architecture that excels in code summarization tasks. Furthermore, we adopt a novel paradigm, "pre-train, prompt, predict", to unlock the knowledge embedded within CodeT5. We devise a prompt template to convert input code into code prompts and fine-tune CodeT5 with these prompts—a process we term prompt tuning. Our effectiveness experiments demonstrate that prompt tuning CodeT5 with only 40% of the dataset can achieve comparable performance to fine-tuning CodeT5 with 100% of the dataset. This means our approach is applicable in few-shot learning scenarios. Additionally, our prompt learning method is not sensitive to the size of the tuning dataset. Our practicality experiments show that the performance of prompt-tuned CodeT5 far surpasses that of transformer-based models trained on code-comment datasets collected from Stack Overflow.
文摘Instructional videos are very useful for completing complex daily tasks,which naturally contain abundant clip-narration pairs.Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then finetuning downstream classifiers and localizers in predetermined category space.These video-language models are proficient at representing short-term actions,basic objects,and their combinations,but they are still far from understanding long-term procedures.In addition,the predetermined procedure category faces the problem of combination disaster and is inherently inapt to unseen procedures.Therefore,we propose a novel compositional prompt learning(CPL)framework to understand long-term procedures by prompting short-term video-language models and reformulating several classical procedure understanding tasks into general video-text matching problems.Specifically,the proposed CPL consists of one visual prompt and three compositional textual prompts(including the action prompt,object prompt,and procedure prompt),which could compositionally distill knowledge from short-term video-language models to facilitate long-term procedure understanding.Besides,the task reformulation enables our CPL to perform well in all zero-shot,few-shot,and fully-supervised settings.Extensive experiments on two widely-used datasets for procedure understanding demonstrate the effectiveness of the proposed approach.
基金National Key R&D Program of China(No.2021AAA0140203)Zhejiang Provincial Key Research and Development Program of China(No.2021C01164)National Natural Science Foundation of China(Nos.61972384,62132020,and 62203425).
文摘Recent years have seen the wide application of natural language processing(NLP)models in crucial areas such as finance,medical treatment,and news media,raising concerns about the model robustness and vulnerabilities.We find that prompt paradigm can probe special robust defects of pre-trained language models.Malicious prompt texts are first constructed for inputs and a pre-trained language model can generate adversarial examples for victim models via mask-filling.Experimental results show that prompt paradigm can efficiently generate more diverse adversarial examples besides synonym substitution.Then,we propose a novel robust training approach based on prompt paradigm which incorporates prompt texts as the alternatives to adversarial examples and enhances robustness under a lightweight minimax-style optimization framework.Experiments on three real-world tasks and two deep neural models show that our approach can significantly improve the robustness of models to resist adversarial attacks.
基金Supported by the Key Scientific and Technological Research Projects of the Jiangxi Provincial Department of Education(GJJ2200303)the National Social Science Foundation Major Bidding Project(20&ZD068)。
文摘Inferring the fully qualified names(FQNs)of undeclared receiving objects and non-fully-qualified type names(non-FQNs)in partial code is critical for effectively searching,understanding,and reusing partial code.Existing type inference tools,such as COSTER and SNR,rely on a symbolic knowledge base and adopt a dictionary-lookup strategy to map simple names of undeclared receiving objects and non-FQNs to FQNs.However,building a symbolic knowledge base requires parsing compilable code files,which limits the collection of APIs and code contexts,resulting in out-of-vocabulary(OOV)failures.To overcome the limitations of a symbolic knowledge base for FQN inference,we implemented Ask Me Any Type(AMAT),a type of inference plugin embedded in web browsers and integrated development environment(IDE).Unlike the dictionary-lookup strategy,AMAT uses a cloze-style fill-in-the-blank strategy for type inference.By treating code as text,AMAT leverages a fine-tuned large language model(LLM)as a neural knowledge base,thereby preventing the need for code compilation.Experimental results show that AMAT outperforms state-of-the-art tools such as COSTER and SNR.In practice,developers can directly reuse partial code by inferring the FQNs of unresolved type names in real time.