期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Paragraph Vector Representation Based on Word to Vector and CNN Learning 被引量:5
1
作者 Zeyu Xiong Qiangqiang Shen +1 位作者 Yijie Wang Chenyang Zhu 《Computers, Materials & Continua》 SCIE EI 2018年第5期213-227,共15页
Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learn... Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learning.The probability model is loss of semantic information in essential,and it influences the processing accuracy.Machine learning approaches include supervised,unsupervised,and semi-supervised approaches,labeled corpora is necessary for semantics model and supervised learning.The method for achieving a reliably labeled corpus is done manually,it is costly and time-consuming because people have to read each document and annotate the label of each document.Recently,the continuous CBOW model is efficient for learning high-quality distributed vector representations,and it can capture a large number of precise syntactic and semantic word relationships,this model can be easily extended to learn paragraph vector,but it is not precise.Towards these problems,this paper is devoted to developing a new model for learning paragraph vector,we combine the CBOW model and CNNs to establish a new deep learning model.Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy. 展开更多
关键词 Distributed word vector distributed paragraph vector CNNS CBOW deep learning.
下载PDF
Application of Word Embedding to Drug Repositioning
2
作者 Duc Luu Ngo Naoki Yamamoto +5 位作者 Vu Anh Tran Ngoc Giang Nguyen Dau Phan Favorisen Rosyking Lumbanraja Mamoru Kubo Kenji Satou 《Journal of Biomedical Science and Engineering》 2016年第1期7-16,共10页
As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embed... As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embedding algorithm, senses of over 1.7 million words were well represented in sufficiently short feature vectors. Through various analysis including clustering and classification, feasibility of our approach was tested. Finally, our trained classification model achieved 87.6% accuracy in the prediction of drug-disease relation in cancer treatment and succeeded in discovering novel drug-disease relations that were actually reported in recent studies. 展开更多
关键词 Distributed Representation of word Sense Discovery of Drug-Disease Relation word Analogy
下载PDF
Supervised Learning Algorithm on Unstructured Documents for the Classification of Job Offers: Case of Cameroun
3
作者 Fritz Sosso Makembe Roger Atsa Etoundi Hippolyte Tapamo 《Journal of Computer and Communications》 2023年第2期75-88,共14页
Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article ... Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms. 展开更多
关键词 Job Offer Underemployment Text Classification Imbalanced Data Symmetric word distribution Supervised Learning
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部