摘要
1 Introduction Production input of Natural Language Processing(NLP)services can be be manipulated with malicious intent in order to lower the performance to a level that compromises the integrity of the service without any information on the underlying model.The adversaries that perform word-level input perturbations in black-box settings first detect the“important”words,then perform noisy operations,such as replacement or deletion.Gao et al.[1]assigned word importance scores(temporal scores)calculated via querying the partial input for each individual sample.By this procedure,they queried the victim model on both the word score calculation and the perturbation stage,which potentially caused a big requestresponse overhead.In comparison,we only calculate the word importance scores(polarity score)once using an external dataset,and without querying the victim model.