期刊文献+
共找到42篇文章
< 1 2 3 >
每页显示 20 50 100
Word Embeddings and Semantic Spaces in Natural Language Processing 被引量:1
1
作者 Peter J. Worth 《International Journal of Intelligence Science》 2023年第1期1-21,共21页
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ... One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP. 展开更多
关键词 Natural Language Processing Vector Space Models Semantic Spaces word embeddings Representation Learning Text Vectorization Machine Learning Deep Learning
下载PDF
Aspect-Based Sentiment Classification Using Deep Learning and Hybrid of Word Embedding and Contextual Position
2
作者 Waqas Ahmad Hikmat Ullah Khan +3 位作者 Fawaz Khaled Alarfaj Saqib Iqbal Abdullah Mohammad Alomair Naif Almusallam 《Intelligent Automation & Soft Computing》 SCIE 2023年第9期3101-3124,共24页
Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,p... Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,prior methodologies widely utilize either word embedding or tree-based rep-resentations.Meanwhile,the separate use of those deep features such as word embedding and tree-based dependencies has become a significant cause of information loss.Generally,word embedding preserves the syntactic and semantic relations between a couple of terms lying in a sentence.Besides,the tree-based structure conserves the grammatical and logical dependencies of context.In addition,the sentence-oriented word position describes a critical factor that influences the contextual information of a targeted sentence.Therefore,knowledge of the position-oriented information of words in a sentence has been considered significant.In this study,we propose to use word embedding,tree-based representation,and contextual position information in combination to evaluate whether their combination will improve the result’s effectiveness or not.In the meantime,their joint utilization enhances the accurate identification and extraction of targeted aspect terms,which also influences their classification process.In this research paper,we propose a method named Attention Based Multi-Channel Convolutional Neural Net-work(Att-MC-CNN)that jointly utilizes these three deep features such as word embedding with tree-based structure and contextual position informa-tion.These three parameters deliver to Multi-Channel Convolutional Neural Network(MC-CNN)that identifies and extracts the potential terms and classifies their polarities.In addition,these terms have been further filtered with the attention mechanism,which determines the most significant words.The empirical analysis proves the proposed approach’s effectiveness compared to existing techniques when evaluated on standard datasets.The experimental results represent our approach outperforms in the F1 measure with an overall achievement of 94%in identifying aspects and 92%in the task of sentiment classification. 展开更多
关键词 Sentiment analysis word embedding aspect extraction consistency tree multichannel convolutional neural network contextual position information
下载PDF
Enhanced Image Captioning Using Features Concatenation and Efficient Pre-Trained Word Embedding
3
作者 Samar Elbedwehy T.Medhat +1 位作者 Taher Hamza Mohammed F.Alrahmawy 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3637-3652,共16页
One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical archite... One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical architecture of image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem.This paper aims to find optimized models for these two subsystems.For the image feature extraction subsystem,the research tested eight different concatenations of pairs of vision models to get among them the most expressive extracted feature vector of the image.For the caption generation lingual subsystem,this paper tested three different pre-trained language embedding models:Glove(Global Vectors for Word Representation),BERT(Bidirectional Encoder Representations from Transformers),and TaCL(Token-aware Contrastive Learning),to select from them the most accurate pre-trained language embedding model.Our experiments showed that building an image captioning system that uses a concatenation of the two Transformer based models SWIN(Shiftedwindow)and PVT(PyramidVision Transformer)as an image feature extractor,combined with the TaCL language embedding model is the best result among the other combinations. 展开更多
关键词 Image captioning word embedding CONCATENATION TRANSFORMER
下载PDF
Novel Representations of Word Embedding Based on the Zolu Function
4
作者 Jihua Lu Youcheng Zhang 《Journal of Beijing Institute of Technology》 EI CAS 2020年第4期526-530,共5页
Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can ... Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can process extremely large data sets as well as word2vec without increasing the complexity.Also,the models outperform several word embedding methods both in word similarity and syntactic accuracy.The method of ZL-CBOW outperforms CBOW in accuracy by 8.43%on the training set of capital-world,and by 1.24%on the training set of plural-verbs.Moreover,experimental simulations on word similarity and syntactic accuracy show that ZL-CBOW and ZL-SG are superior to LL-CBOW and LL-SG,respectively. 展开更多
关键词 Zolu function word embedding continuous bags of words word similarity accuracy
下载PDF
Learning Better Word Embedding by Asymmetric Low-Rank Projection of Knowledge Graph 被引量:2
5
作者 Fei Tian Bin Gao +1 位作者 En-Hong Chen Tie-Yah Liu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第3期624-634,共11页
Word embedding, which refers to low-dimensional dense vector representations of natural words, has demon- strated its power in many natural language processing tasks. However, it may suffer from the inaccurate and inc... Word embedding, which refers to low-dimensional dense vector representations of natural words, has demon- strated its power in many natural language processing tasks. However, it may suffer from the inaccurate and incomplete information contained in the free text corpus as training data. To tackle this challenge, there have been quite a few studies that leverage knowledge graphs as an additional information source to improve the quality of word embedding. Although these studies have achieved certain success, they have neglected some important facts about knowledge graphs: 1) many relationships in knowledge graphs are many-to-one, one-to-many or even many-to-many, rather than simply one-to-one; 2) most head entities and tail entities in knowledge graphs come from very different semantic spaces. To address these issues, in this paper, we propose a new algorithm named ProjectNet. ProjectNet models the relationships between head and tail entities after transforming them with different low-rank projection matrices. The low-rank projection can allow non one- to-one relationships between entities, while different projection matrices for head and tail entities allow them to originate in different semantic spaces. The experimental results demonstrate that ProjectNet yields more accurate word embedding than previous studies, and thus leads to clear improvements in various natural language processing tasks. 展开更多
关键词 natural language processing word embedding neural network knowledge graph
原文传递
Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding 被引量:11
6
作者 Ming Liu Bo Lang +1 位作者 Zepeng Gu Ahmed Zeeshan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期619-632,共14页
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema... Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models. 展开更多
关键词 document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis
原文传递
News recommendation based on time factor and word embedding 被引量:1
7
作者 Gu Yiran Zhou Peng Yang Haigen 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2021年第5期82-90,共9页
Existing algorithms of news recommendations lack in depth analysis of news texts and timeliness. To address these issues, an algorithm for news recommendations based on time factor and word embedding(TFWE) was propose... Existing algorithms of news recommendations lack in depth analysis of news texts and timeliness. To address these issues, an algorithm for news recommendations based on time factor and word embedding(TFWE) was proposed to improve the interpretability and precision of news recommendations. First, TFWE used term frequency-inverse document frequency(TF-IDF) to extract news feature words and used the bidirectional encoder representations from transformers(BERT) pre-training model to convert the feature words into vector representations. By calculating the distance between the vectors, TFWE analyzed the semantic similarity to construct a user interest model. Second, considering the timeliness of news, a method of calculating news popularity by integrating time factors into the similarity calculation was proposed. Finally, TFWE combined the similarity of news content with the similarity of collaborative filtering(CF) and recommended some news with higher rankings to users. In addition, results of the experiments on real dataset showed that TFWE significantly improved precision, recall, and F1 score compared to the classic hybrid recommendation algorithm. 展开更多
关键词 news recommendation time factor word embedding user interest model
原文传递
Syntactic word embedding based on dependency syntax and polysemous analysis
8
作者 Zhong-lin YE Hai-xing ZHAO 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2018年第4期524-535,共12页
Most word embedding models have the following problems:(1)In the models based on bag-of-words contexts,the structural relations of sentences are completely neglected;(2)Each word uses a single embedding,which makes th... Most word embedding models have the following problems:(1)In the models based on bag-of-words contexts,the structural relations of sentences are completely neglected;(2)Each word uses a single embedding,which makes the model indiscriminative for polysemous words;(3)Word embedding easily tends to contextual structure similarity of sentences.To solve these problems,we propose an easy-to-use representation algorithm of syntactic word embedding(SWE).The main procedures are:(1)A polysemous tagging algorithm is used for polysemous representation by the latent Dirichlet allocation(LDA)algorithm;(2)Symbols‘+’and‘-’are adopted to indicate the directions of the dependency syntax;(3)Stopwords and their dependencies are deleted;(4)Dependency skip is applied to connect indirect dependencies;(5)Dependency-based contexts are inputted to a word2vec model.Experimental results show that our model generates desirable word embedding in similarity evaluation tasks.Besides,semantic and syntactic features can be captured from dependency-based syntactic contexts,exhibiting less topical and more syntactic similarity.We conclude that SWE outperforms single embedding learning models. 展开更多
关键词 Dependency-based context Polysemous word representation Representation learning Syntactic word embedding
原文传递
Relation Reconstructive Binarization of word embeddings
9
作者 Feiyang PAN Shuokai LI +1 位作者 Xiang AO Qing HE 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第2期47-54,共8页
Word-embedding acts as one of the backbones of modern natural language processing(NLP).Recently,with the need for deploying NLP models to low-resource devices,there has been a surge of interest to compress word embedd... Word-embedding acts as one of the backbones of modern natural language processing(NLP).Recently,with the need for deploying NLP models to low-resource devices,there has been a surge of interest to compress word embeddings into hash codes or binary vectors so as to save the storage and memory consumption.Typically,existing work learns to encode an embedding into a compressed representation from which the original embedding can be reconstructed.Although these methods aim to preserve most information of every individual word,they often fail to retain the relation between words,thus can yield large loss on certain tasks.To this end,this paper presents Relation Reconstructive Binarization(R2B)to transform word embeddings into binary codes that can preserve the relation between words.At its heart,R2B trains an auto-encoder to generate binary codes that allow reconstructing the wordby-word relations in the original embedding space.Experiments showed that our method achieved significant improvements over previous methods on a number of tasks along with a space-saving of up to 98.4%.Specifically,our method reached even better results on word similarity evaluation than the uncompressed pre-trained embeddings,and was significantly better than previous compression methods that do not consider word relations. 展开更多
关键词 embedding compression variational auto-encoder binary word embedding
原文传递
Word Embedding Bootstrapped Deep Active Learning Method to Information Extraction on Chinese Electronic Medical Record
10
作者 MA Qunsheng CEN Xingxing +1 位作者 YUAN Junyi HOU Xumin 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第4期494-502,共9页
Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, whic... Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, which increases the use cost and hinders its applications. In this work, an effective named entity recognition (NER) method is presented for information extraction on Chinese EMR, which is achieved by word embedding bootstrapped deep active learning to promote the acquisition of medical information from Chinese EMR and to release its value. In this work, deep active learning of bi-directional long short-term memory followed by conditional random field (Bi-LSTM+CRF) is used to capture the characteristics of different information from labeled corpus, and the word embedding models of contiguous bag of words and skip-gram are combined in the above model to respectively capture the text feature of Chinese EMR from unlabeled corpus. To evaluate the performance of above method, the tasks of NER on Chinese EMR with “medical history” content were used. Experimental results show that the word embedding bootstrapped deep active learning method using unlabeled medical corpus can achieve a better performance compared with other models. 展开更多
关键词 deep active learning named entity recognition(NER) information extraction word embedding Chinese electronic medical record(EMR)
原文传递
How Do Pronouns Affect Word Embedding
11
作者 Tonglee Chung Bin Xu +2 位作者 Yongbin Liu Juanzi Li Chunping Ouyang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期586-594,共9页
Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in... Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets.Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88%. We find that the words that are names of countries are affected the most,which is as expected. 展开更多
关键词 word embedding co-reference resolution representation learning
原文传递
Hybrid Scalable Researcher Recommendation System Using Azure Data Lake Analytics
12
作者 Dinesh Kalla Nathan Smith +1 位作者 Fnu Samaah Kiran Polimetla 《Journal of Data Analysis and Information Processing》 2024年第1期76-88,共13页
This research paper has provided the methodology and design for implementing the hybrid author recommender system using Azure Data Lake Analytics and Power BI. It offers a recommendation for the top 1000 Authors of co... This research paper has provided the methodology and design for implementing the hybrid author recommender system using Azure Data Lake Analytics and Power BI. It offers a recommendation for the top 1000 Authors of computer science in different fields of study. The technique used in this paper is handling the inadequate Information for citation;it removes the problem of cold start, which is encountered by very many other recommender systems. In this paper, abstracts, the titles, and the Microsoft academic graphs have been used in coming up with the recommendation list for every document, which is used to combine the content-based approaches and the co-citations. Prioritization and the blending of every technique have been allowed by the tuning system parameters, allowing for the authority in results of recommendation versus the paper novelty. In the end, we do observe that there is a direct correlation between the similarity rankings that have been produced by the system and the scores of the participant. The results coming from the associated scrips of analysis and the user survey have been made available through the recommendation system. Managers must gain the required expertise to fully utilize the benefits that come with business intelligence systems [1]. Data mining has become an important tool for managers that provides insights about their daily operations and leverage the information provided by decision support systems to improve customer relationships [2]. Additionally, managers require business intelligence systems that can rank the output in the order of priority. Ranking algorithm can replace the traditional data mining algorithms that will be discussed in-depth in the literature review [3]. 展开更多
关键词 Azure Data Lake U-SQL Author Recommendation System Power BI Microsoft Academic Big Data word embedding
下载PDF
Neural Machine Translation Models with Attention-Based Dropout Layer
13
作者 Huma Israr Safdar Abbas Khan +3 位作者 Muhammad Ali Tahir Muhammad Khuram Shahzad Muneer Ahmad Jasni Mohamad Zain 《Computers, Materials & Continua》 SCIE EI 2023年第5期2981-3009,共29页
In bilingual translation,attention-based Neural Machine Translation(NMT)models are used to achieve synchrony between input and output sequences and the notion of alignment.NMT model has obtained state-of-the-art perfo... In bilingual translation,attention-based Neural Machine Translation(NMT)models are used to achieve synchrony between input and output sequences and the notion of alignment.NMT model has obtained state-of-the-art performance for several language pairs.However,there has been little work exploring useful architectures for Urdu-to-English machine translation.We conducted extensive Urdu-to-English translation experiments using Long short-term memory(LSTM)/Bidirectional recurrent neural networks(Bi-RNN)/Statistical recurrent unit(SRU)/Gated recurrent unit(GRU)/Convolutional neural network(CNN)and Transformer.Experimental results show that Bi-RNN and LSTM with attention mechanism trained iteratively,with a scalable data set,make precise predictions on unseen data.The trained models yielded competitive results by achieving 62.6%and 61%accuracy and 49.67 and 47.14 BLEU scores,respectively.From a qualitative perspective,the translation of the test sets was examined manually,and it was observed that trained models tend to produce repetitive output more frequently.The attention score produced by Bi-RNN and LSTM produced clear alignment,while GRU showed incorrect translation for words,poor alignment and lack of a clear structure.Therefore,we considered refining the attention-based models by defining an additional attention-based dropout layer.Attention dropout fixes alignment errors and minimizes translation errors at the word level.After empirical demonstration and comparison with their counterparts,we found improvement in the quality of the resulting translation system and a decrease in the perplexity and over-translation score.The ability of the proposed model was evaluated using Arabic-English and Persian-English datasets as well.We empirically concluded that adding an attention-based dropout layer helps improve GRU,SRU,and Transformer translation and is considerably more efficient in translation quality and speed. 展开更多
关键词 Natural language processing neural machine translation word embedding ATTENTION PERPLEXITY selective dropout regularization URDU PERSIAN Arabic BLEU
下载PDF
Quantum Particle Swarm Optimization with Deep Learning-Based Arabic Tweets Sentiment Analysis
14
作者 Badriyya BAl-onazi Abdulkhaleq Q.A.Hassan +5 位作者 Mohamed K.Nour Mesfer Al Duhayyim Abdullah Mohamed Amgad Atta Abdelmageed Ishfaq Yaseen Gouse Pasha Mohammed 《Computers, Materials & Continua》 SCIE EI 2023年第5期2575-2591,共17页
Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier u... Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier upon the SA of the tweets were mostly aimed at automating the feature extraction process.In this background,the current study introduces a novel method called Quantum Particle Swarm Optimization with Deep Learning-Based Sentiment Analysis on Arabic Tweets(QPSODL-SAAT).The presented QPSODL-SAAT model determines and classifies the sentiments of the tweets written in Arabic.Initially,the data pre-processing is performed to convert the raw tweets into a useful format.Then,the word2vec model is applied to generate the feature vectors.The Bidirectional Gated Recurrent Unit(BiGRU)classifier is utilized to identify and classify the sentiments.Finally,the QPSO algorithm is exploited for the optimal finetuning of the hyperparameters involved in the BiGRU model.The proposed QPSODL-SAAT model was experimentally validated using the standard datasets.An extensive comparative analysis was conducted,and the proposed model achieved a maximum accuracy of 98.35%.The outcomes confirmed the supremacy of the proposed QPSODL-SAAT model over the rest of the approaches,such as the Surface Features(SF),Generic Embeddings(GE),Arabic Sentiment Embeddings constructed using the Hybrid(ASEH)model and the Bidirectional Encoder Representations from Transformers(BERT)model. 展开更多
关键词 Sentiment analysis Arabic tweets quantum particle swarm optimization deep learning word embedding
下载PDF
Personality Assessment Based on Natural Stream of Thoughts Empowered with Machine Learning
15
作者 Mohammed Salahat Liaqat Ali +1 位作者 Taher M.Ghazal Haitham M.Alzoubi 《Computers, Materials & Continua》 SCIE EI 2023年第7期1-17,共17页
Knowing each other is obligatory in a multi-agent collaborative environment.Collaborators may develop the desired know-how of each other in various aspects such as habits,job roles,status,and behaviors.Among different... Knowing each other is obligatory in a multi-agent collaborative environment.Collaborators may develop the desired know-how of each other in various aspects such as habits,job roles,status,and behaviors.Among different distinguishing characteristics related to a person,personality traits are an effective predictive tool for an individual’s behavioral pattern.It has been observed that when people are asked to share their details through questionnaires,they intentionally or unintentionally become biased.They knowingly or unknowingly provide enough information in much-unbiased comportment in open writing about themselves.Such writings can effectively assess an individual’s personality traits that may yield enormous possibilities for applications such as forensic departments,job interviews,mental health diagnoses,etc.Stream of consciousness,collected by James Pennbaker and Laura King,is one such way of writing,referring to a narrative technique where the emotions and thoughts of the writer are presented in a way that brings the reader to the fluid through the mental states of the narrator.More-over,computationally,various attempts have been made in an individual’s personality traits assessment through deep learning algorithms;however,the effectiveness and reliability of results vary with varying word embedding techniques.This article proposes an empirical approach to assessing personality by applying convolutional networks to text documents.Bidirectional Encoder Representations from Transformers(BERT)word embedding technique is used for word vector generation to enhance the contextual meanings. 展开更多
关键词 Personality traits convolutional neural network deep learning word embedding
下载PDF
Translation of English Language into Urdu Language Using LSTM Model
16
作者 Sajadul Hassan Kumhar Syed Immamul Ansarullah +3 位作者 Akber Abid Gardezi Shafiq Ahmad Abdelaty Edrees Sayed Muhammad Shafiq 《Computers, Materials & Continua》 SCIE EI 2023年第2期3899-3912,共14页
English to Urdu machine translation is still in its beginning and lacks simple translation methods to provide motivating and adequate English to Urdu translation.In order tomake knowledge available to the masses,there... English to Urdu machine translation is still in its beginning and lacks simple translation methods to provide motivating and adequate English to Urdu translation.In order tomake knowledge available to the masses,there should be mechanisms and tools in place to make things understandable by translating from source language to target language in an automated fashion.Machine translation has achieved this goal with encouraging results.When decoding the source text into the target language,the translator checks all the characteristics of the text.To achieve machine translation,rule-based,computational,hybrid and neural machine translation approaches have been proposed to automate the work.In this research work,a neural machine translation approach is employed to translate English text into Urdu.Long Short Term Short Model(LSTM)Encoder Decoder is used to translate English to Urdu.The various steps required to perform translation tasks include preprocessing,tokenization,grammar and sentence structure analysis,word embeddings,training data preparation,encoder-decoder models,and output text generation.The results show that the model used in the research work shows better performance in translation.The results were evaluated using bilingual research metrics and showed that the test and training data yielded the highest score sequences with an effective length of ten(10). 展开更多
关键词 Machine translation Urdu language word embedding
下载PDF
Improved Metaheuristics with Deep Learning Enabled Movie Review Sentiment Analysis
17
作者 Abdelwahed Motwakel Najm Alotaibi +5 位作者 Eatedal Alabdulkreem Hussain Alshahrani MohamedAhmed Elfaki Mohamed K Nour Radwa Marzouk Mahmoud Othman 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1249-1266,共18页
Sentiment Analysis(SA)of natural language text is not only a challenging process but also gains significance in various Natural Language Processing(NLP)applications.The SA is utilized in various applications,namely,ed... Sentiment Analysis(SA)of natural language text is not only a challenging process but also gains significance in various Natural Language Processing(NLP)applications.The SA is utilized in various applications,namely,education,to improve the learning and teaching processes,marketing strategies,customer trend predictions,and the stock market.Various researchers have applied lexicon-related approaches,Machine Learning(ML)techniques and so on to conduct the SA for multiple languages,for instance,English and Chinese.Due to the increased popularity of the Deep Learning models,the current study used diverse configuration settings of the Convolution Neural Network(CNN)model and conducted SA for Hindi movie reviews.The current study introduces an Effective Improved Metaheuristics with Deep Learning(DL)-Enabled Sentiment Analysis for Movie Reviews(IMDLSA-MR)model.The presented IMDLSA-MR technique initially applies different levels of pre-processing to convert the input data into a compatible format.Besides,the Term Frequency-Inverse Document Frequency(TF-IDF)model is exploited to generate the word vectors from the pre-processed data.The Deep Belief Network(DBN)model is utilized to analyse and classify the sentiments.Finally,the improved Jellyfish Search Optimization(IJSO)algorithm is utilized for optimal fine-tuning of the hyperparameters related to the DBN model,which shows the novelty of the work.Different experimental analyses were conducted to validate the better performance of the proposed IMDLSA-MR model.The comparative study outcomes highlighted the enhanced performance of the proposed IMDLSA-MR model over recent DL models with a maximum accuracy of 98.92%. 展开更多
关键词 Corpus linguistics sentiment analysis natural language processing deep learning word embedding
下载PDF
An Intelligent Deep Neural Sentiment Classification Network
18
作者 Umamaheswari Ramalingam Senthil Kumar Murugesan +1 位作者 Karthikeyan Lakshmanan Chidhambararajan Balasubramaniyan 《Intelligent Automation & Soft Computing》 SCIE 2023年第5期1733-1744,共12页
A Deep Neural Sentiment Classification Network(DNSCN)is devel-oped in this work to classify the Twitter data unambiguously.It attempts to extract the negative and positive sentiments in the Twitter database.The main go... A Deep Neural Sentiment Classification Network(DNSCN)is devel-oped in this work to classify the Twitter data unambiguously.It attempts to extract the negative and positive sentiments in the Twitter database.The main goal of the system is tofind the sentiment behavior of tweets with minimum ambiguity.A well-defined neural network extracts deep features from the tweets automatically.Before extracting features deeper and deeper,the text in each tweet is represented by Bag-of-Words(BoW)and Word Embeddings(WE)models.The effectiveness of DNSCN architecture is analyzed using Twitter-Sanders-Apple2(TSA2),Twit-ter-Sanders-Apple3(TSA3),and Twitter-DataSet(TDS).TSA2 and TDS consist of positive and negative tweets,whereas TSA3 has neutral tweets also.Thus,the proposed DNSCN acts as a binary classifier for TSA2 and TDS databases and a multiclass classifier for TSA3.The performances of DNSCN architecture are evaluated by F1 score,precision,and recall rates using 5-fold and 10-fold cross-validation.Results show that the DNSCN-WE model provides more accuracy than the DNSCN-BoW model for representing the tweets in the feature encoding.The F1 score of the DNSCN-BW based system on the TSA2 database is 0.98(binary classification)and 0.97(three-class classification)for the TSA3 database.This system provides better a F1 score of 0.99 for the TDS database. 展开更多
关键词 Deep neural network word embeddings BAG-OF-wordS sentiment analysis text classification
下载PDF
A Data Mining Approach to Detecting Bias and Favoritism in Public Procurement
19
作者 Yeferson Torres-Berru Vivian F.Lopez-Batista Lorena Conde Zhingre 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3501-3516,共16页
In a public procurement process,corruption can occur at each stage,favoring a participant with a previous agreement,which can result in over-pricing and purchases of substandard products,as well as gender discriminati... In a public procurement process,corruption can occur at each stage,favoring a participant with a previous agreement,which can result in over-pricing and purchases of substandard products,as well as gender discrimination.This paper’s aim is to detect biased purchases using a Spanish Language corpus,ana-lyzing text from the questions and answers registry platform by applicants in a public procurement process in Ecuador.Additionally,gender bias is detected,pro-moting both men and women to participate under the same conditions.In order to detect gender bias and favoritism towards certain providers by contracting enti-ties,the study proposes a unique hybrid model that combines Artificial Intelli-gence algorithms and Natural Language Processing(NLP).In the experimental work,303,076 public procurement processes have been analyzed over 10 years(since 2010)with 1,009,739 questions and answers to suppliers and public insti-tutions in each process.Gender bias and favoritism were analyzed using a Word2-vec model with word embedding,as well as sentiment analysis of the questions and answers using the VADER algorithm.In 32%of cases(96,984 answers),there was favoritism or gender bias as evidenced by responses from contracting entities.The proposed model provides accuracy rates of 88% for detecting favor-itism,and 90%for detecting gender bias.Consequently one-third of the procure-ment processes carried out by the state have indications of corruption and bias.In Latin America,government corruption is one of the most significant challenges,making the resulting classifier useful for detecting bias and favoritism in public procurement processes. 展开更多
关键词 FAVORITISM BIAS natural language processing word2vec sentiment analysis word embeddings
下载PDF
Suggestion Mining from Opinionated Text of Big Social Media Data 被引量:6
20
作者 Youseef Alotaibi Muhammad Noman Malik +4 位作者 Huma Hayat Khan Anab Batool Saif ul Islam Abdulmajeed Alsufyani Saleh Alghamdi 《Computers, Materials & Continua》 SCIE EI 2021年第9期3323-3338,共16页
:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates cha... :Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates challenges for customers and businesses in reviewing all content in their decision-making process.To overcome this challenge,extracting suggestions from opinionated text is a possible solution.In this study,the characteristics of suggestions are analyzed and a suggestion mining extraction process is presented for classifying suggestive sentences from online customers’reviews.A classification using a word-embedding approach is used via the XGBoost classifier.The two datasets used in this experiment relate to online hotel reviews and Microsoft Windows App Studio discussion reviews.F1,precision,recall,and accuracy scores are calculated.The results demonstrated that the XGBoost classifier outperforms—with an accuracy of more than 80%.Moreover,the results revealed that suggestion keywords and phrases are the predominant features for suggestion extraction.Thus,this study contributes to knowledge and practice by comparing feature extraction classifiers and identifying XGBoost as a better suggestion mining process for identifying online reviews. 展开更多
关键词 Suggestion mining word embedding Naïve Bayes random forest XGBoost DATASET
下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部