摘要
Data augmentation methods are often used to address data scarcity in natural language processing(NLP).However,token-label misalignment,which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences,hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition(NER).In this paper,we propose embedded prompt tuning(EPT)as a novel data augmentation approach to low-resource NER.To address the problem of token-label misalignment,we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model,and therefore entity tokens masked can be predicted by the finetuned EPT.Hence,EPT can generate high-quality and high-diverse data with various entities,which improves performance of NER.As datasets of cross-domain NER are available,we also explore NER domain adaption with EPT.The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.