Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimat...Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.展开更多
Aspect-based sentiment analysis(ABSA)is a fine-grained process.Its fundamental subtasks are aspect termextraction(ATE)and aspect polarity classification(APC),and these subtasks are dependent and closely related.Howeve...Aspect-based sentiment analysis(ABSA)is a fine-grained process.Its fundamental subtasks are aspect termextraction(ATE)and aspect polarity classification(APC),and these subtasks are dependent and closely related.However,most existing works on Arabic ABSA content separately address them,assume that aspect terms are preidentified,or use a pipeline model.Pipeline solutions design different models for each task,and the output from the ATE model is used as the input to the APC model,which may result in error propagation among different steps because APC is affected by ATE error.These methods are impractical for real-world scenarios where the ATE task is the base task for APC,and its result impacts the accuracy of APC.Thus,in this study,we focused on a multi-task learning model for Arabic ATE and APC in which the model is jointly trained on two subtasks simultaneously in a singlemodel.This paper integrates themulti-task model,namely Local Cotext Foucse-Aspect Term Extraction and Polarity classification(LCF-ATEPC)and Arabic Bidirectional Encoder Representation from Transformers(AraBERT)as a shred layer for Arabic contextual text representation.The LCF-ATEPC model is based on a multi-head selfattention and local context focus mechanism(LCF)to capture the interactive information between an aspect and its context.Moreover,data augmentation techniques are proposed based on state-of-the-art augmentation techniques(word embedding substitution with constraints and contextual embedding(AraBERT))to increase the diversity of the training dataset.This paper examined the effect of data augmentation on the multi-task model for Arabic ABSA.Extensive experiments were conducted on the original and combined datasets(merging the original and augmented datasets).Experimental results demonstrate that the proposed Multi-task model outperformed existing APC techniques.Superior results were obtained by AraBERT and LCF-ATEPC with fusion layer(AR-LCF-ATEPC-Fusion)and the proposed data augmentation word embedding-based method(FastText)on the combined dataset.展开更多
文摘Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.
文摘Aspect-based sentiment analysis(ABSA)is a fine-grained process.Its fundamental subtasks are aspect termextraction(ATE)and aspect polarity classification(APC),and these subtasks are dependent and closely related.However,most existing works on Arabic ABSA content separately address them,assume that aspect terms are preidentified,or use a pipeline model.Pipeline solutions design different models for each task,and the output from the ATE model is used as the input to the APC model,which may result in error propagation among different steps because APC is affected by ATE error.These methods are impractical for real-world scenarios where the ATE task is the base task for APC,and its result impacts the accuracy of APC.Thus,in this study,we focused on a multi-task learning model for Arabic ATE and APC in which the model is jointly trained on two subtasks simultaneously in a singlemodel.This paper integrates themulti-task model,namely Local Cotext Foucse-Aspect Term Extraction and Polarity classification(LCF-ATEPC)and Arabic Bidirectional Encoder Representation from Transformers(AraBERT)as a shred layer for Arabic contextual text representation.The LCF-ATEPC model is based on a multi-head selfattention and local context focus mechanism(LCF)to capture the interactive information between an aspect and its context.Moreover,data augmentation techniques are proposed based on state-of-the-art augmentation techniques(word embedding substitution with constraints and contextual embedding(AraBERT))to increase the diversity of the training dataset.This paper examined the effect of data augmentation on the multi-task model for Arabic ABSA.Extensive experiments were conducted on the original and combined datasets(merging the original and augmented datasets).Experimental results demonstrate that the proposed Multi-task model outperformed existing APC techniques.Superior results were obtained by AraBERT and LCF-ATEPC with fusion layer(AR-LCF-ATEPC-Fusion)and the proposed data augmentation word embedding-based method(FastText)on the combined dataset.