Embedding Extraction for Arabic Text Using the AraBERT Model

下载PDF

导出

摘要 Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.

作者 Amira Hamed Abo-Elghit Taher Hamza Aya Al-Zoghby

机构地区 Faculty of Computers and Information Faculty of Computers and Artificial Intelligence

出处《Computers, Materials & Continua》 SCIE EI 2022年第7期1967-1994,共28页 计算机、材料和连续体（英文）

关键词 Semantic textual similarity arabic language EMBEDDINGS AraBERT pre-trained models regression contextual-based models concurrency concept

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Li-Gang Gao,Meng-Yun Yang,Jian-Xin Wang.Collaborative Matrix Factorization with Soft Regularization for Drug-Target Interaction Prediction[J].Journal of Computer Science & Technology,2021,36(2):310-322.
2安鹤男,杨佳洲,邓武才,管聪,马超.基于YOLOx残差块融合CoA模块的改进检测网络[J].计算机系统应用,2022,31(8):245-251. 被引量：4
3王友顺,高童迪,袁正道,丁永春.应用双线性模型的离格波达角估计方法[J].电讯技术,2022,62(4):482-488.
4Omar Badr.As-Salamu Alaykum From Beijing![J].Beijing Review,2021,64(49):48-48.
5Douadi Drihem.Variable Besov Spaces:Continuous Version[J].Journal of Mathematical Study,2019,52(2):178-226.
6Jie LENG,Xijin TANG.Graph Attention Networks for Multiple Pairs of Entities and Aspects Sentiment Analysis in Long Texts[J].Journal of Systems Science and Information,2022,10(3):203-215.
7Femi Emmanuel Ayo,Olusegun Folorunso,Friday Thomas Ibharalu,Idowu Ademola Osinuga.Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks[J].International Journal of Intelligent Computing and Cybernetics,2020,13(4):485-525. 被引量：5
8Ahmad Hussein Ababneh.Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning[J].Journal of Electronic Science and Technology,2022,20(2):187-208. 被引量：1
9Chris Kenyon,Robert Colebunders,Sipho Dlamini,Herman Meulemans,Sizwe Zondo.A Critical Appraisal of the Ideology of Monogamy’s Influence on HIV Epidemiology[J].World Journal of AIDS,2016,6(1):16-26.
10Tang Fei,Dong Kun,Ye Zhangtao,Ling Guowei.Authentication scheme for industrial Internet of things based on DAG blockchain[J].The Journal of China Universities of Posts and Telecommunications,2021,28(6):1-12.

Computers, Materials & Continua

2022年第7期

浏览历史

内容加载中请稍等...

Embedding Extraction for Arabic Text Using the AraBERT Model

相关作者

相关机构

相关主题

浏览历史