The ability of pre-trained BERT model to achieve outstanding performances on many Natural Language Processing(NLP)tasks has attracted the attention of researchers in recent times.However,the huge computational and mem...The ability of pre-trained BERT model to achieve outstanding performances on many Natural Language Processing(NLP)tasks has attracted the attention of researchers in recent times.However,the huge computational and memory requirements have hampered its widespread deployment on devices with limited resources.The concept of knowledge distillation has shown to produce smaller and faster distilled models with less trainable parameters and intended for resource-constrained environments.The distilled models can be fine-tuned with great performance on a wider range of tasks,such as sentiment classification.This paper evaluates the performance of DistilBERT model and other pre-canned text classifiers on a Covid-19 online news binary classification dataset.The analysis shows that despite having fewer trainable parameters than the BERT-based model,the DistilBERT model achieved an accuracy of 0.94 on the validation set after only two training epochs.The paper also highlights the usefulness of the ktrain library in facilitating the building,training,and application of state-of-the-art Machine Learning and Deep Learning models.展开更多
基金This study was supported by the National Key R\&D Program of China,Grant No.2018YFA0306703.
文摘The ability of pre-trained BERT model to achieve outstanding performances on many Natural Language Processing(NLP)tasks has attracted the attention of researchers in recent times.However,the huge computational and memory requirements have hampered its widespread deployment on devices with limited resources.The concept of knowledge distillation has shown to produce smaller and faster distilled models with less trainable parameters and intended for resource-constrained environments.The distilled models can be fine-tuned with great performance on a wider range of tasks,such as sentiment classification.This paper evaluates the performance of DistilBERT model and other pre-canned text classifiers on a Covid-19 online news binary classification dataset.The analysis shows that despite having fewer trainable parameters than the BERT-based model,the DistilBERT model achieved an accuracy of 0.94 on the validation set after only two training epochs.The paper also highlights the usefulness of the ktrain library in facilitating the building,training,and application of state-of-the-art Machine Learning and Deep Learning models.