期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
基于Word2vector的文本特征化表示方法 被引量:21
1
作者 周顺先 蒋励 +2 位作者 林霜巧 龚德良 王鲁达 《重庆邮电大学学报(自然科学版)》 CSCD 北大核心 2018年第2期272-279,共8页
针对基于词语统计的特征化表示无法有效提取文本的词义特征的问题,提出一种基于上下文关系的文本特征化表示方法。该方法利用Word2vector提取词义特征,获得词向量;再对词向量进行"最优适应度划分"的聚类,并根据聚类结果将词... 针对基于词语统计的特征化表示无法有效提取文本的词义特征的问题,提出一种基于上下文关系的文本特征化表示方法。该方法利用Word2vector提取词义特征,获得词向量;再对词向量进行"最优适应度划分"的聚类,并根据聚类结果将词语替代表示为聚类质心;根据质心及其所代表的词语的词频,构成词向量聚类质心频率模型(semantic frequency-inverse document frequency,SF-IDF),用于特征化表示文本。在不依赖语义规则的情况下,分别以路透社文本集Reuter-21578、维基百科(extensible markup language,XML)数据为文本数据集,采用神经网络语言模型(neural network language model,NNLM)算法进行文本分类实验,并采用F1-measure标准进行样本分类的效果评估,词向量聚类质心频率模型SF-IDF(semantic frequency-inverse document frequency,SF-IDF)向量与现有技术中词频-逆向文件频率(term frequency-inverse document frequency,TF-IDF)向量的分类效果对比,与TF-IDF模型进行对比实验;在Reuter-21578数据集上平均准确率由原有的57.1%提高到63.3%,在Wikipedia XML数据集上平均准确率由原有的48.7%提高到59.2%。SF-IDF模型可适用于现行的基于特征向量的信息检索算法,且较TF-IDF模型有更高的文本相似性分析效率,可提升文本分类准确率。 展开更多
关键词 word2vector 上下文关系 特征化表示 文本分类
下载PDF
Paragraph Vector Representation Based on Word to Vector and CNN Learning 被引量:5
2
作者 Zeyu Xiong Qiangqiang Shen +1 位作者 Yijie Wang Chenyang Zhu 《Computers, Materials & Continua》 SCIE EI 2018年第5期213-227,共15页
Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learn... Document processing in natural language includes retrieval,sentiment analysis,theme extraction,etc.Classical methods for handling these tasks are based on models of probability,semantics and networks for machine learning.The probability model is loss of semantic information in essential,and it influences the processing accuracy.Machine learning approaches include supervised,unsupervised,and semi-supervised approaches,labeled corpora is necessary for semantics model and supervised learning.The method for achieving a reliably labeled corpus is done manually,it is costly and time-consuming because people have to read each document and annotate the label of each document.Recently,the continuous CBOW model is efficient for learning high-quality distributed vector representations,and it can capture a large number of precise syntactic and semantic word relationships,this model can be easily extended to learn paragraph vector,but it is not precise.Towards these problems,this paper is devoted to developing a new model for learning paragraph vector,we combine the CBOW model and CNNs to establish a new deep learning model.Experimental results show that paragraph vector generated by the new model is better than the paragraph vector generated by CBOW model in semantic relativeness and accuracy. 展开更多
关键词 Distributed word vector distributed paragraph vector CNNS CBOW deep learning.
下载PDF
一种基于Word2Vector与编辑距离的句子相似度计算方法 被引量:4
3
作者 陆尹浩 《电脑知识与技术(过刊)》 2017年第2X期146-147,共2页
随着各种问答系统的流行与聊天机器人的火热,对句子相似性的比较和处理越来越成为各类类似系统的核心部分。因此,研究并设计出一种好的句子相似性比较方法变得越来越关键。该文基于一种深度学习模型Word2Vector并且结合编辑距离算法提... 随着各种问答系统的流行与聊天机器人的火热,对句子相似性的比较和处理越来越成为各类类似系统的核心部分。因此,研究并设计出一种好的句子相似性比较方法变得越来越关键。该文基于一种深度学习模型Word2Vector并且结合编辑距离算法提出了一种句子相似度计算方法,给出了具体的设计思路,并且通过实验验证了该方法的有效性,最后总结了该方法的优缺点。 展开更多
关键词 句子相似度计算 word2Vector 编辑距离 Edit Distance
下载PDF
Improve Neural Machine Translation by Building Word Vector with Part of Speech 被引量:3
4
作者 Jinyingming Zhang Jin Liu Xinyue Lin 《Journal on Artificial Intelligence》 2020年第2期79-88,共10页
Neural Machine Translation(NMT)based system is an important technology for translation applications.However,there is plenty of rooms for the improvement of NMT.In the process of NMT,traditional word vector cannot dist... Neural Machine Translation(NMT)based system is an important technology for translation applications.However,there is plenty of rooms for the improvement of NMT.In the process of NMT,traditional word vector cannot distinguish the same words under different parts of speech(POS).Aiming to alleviate this problem,this paper proposed a new word vector training method based on POS feature.It can efficiently improve the quality of translation by adding POS feature to the training process of word vectors.In the experiments,we conducted extensive experiments to evaluate our methods.The experimental result shows that the proposed method is beneficial to improve the quality of translation from English into Chinese. 展开更多
关键词 Machine translation parts of speech word vector
下载PDF
Towards privacy-preserving and efficient word vector learning for lightweight IoT devices
5
作者 Nan Jia Shaojing Fu +2 位作者 Guangquan Xu Kai Huang Ming Xu 《Digital Communications and Networks》 SCIE 2024年第4期895-903,共9页
Nowadays,Internet of Things(IoT)is widely deployed and brings great opportunities to change people's daily life.To realize more effective human-computer interaction in the IoT applications,the Question Answering(Q... Nowadays,Internet of Things(IoT)is widely deployed and brings great opportunities to change people's daily life.To realize more effective human-computer interaction in the IoT applications,the Question Answering(QA)systems implanted in the IoT services are supposed to improve the ability to understand natural language.Therefore,the distributed representation of words,which contains more semantic or syntactic information,has been playing a more and more important role in the QA systems.However,learning high-quality distributed word vectors requires lots of storage and computing resources,hence it cannot be deployed on the resource-constrained IoT devices.It is a good choice to outsource the data and computation to the cloud servers.Nevertheless,it could cause privacy risks to directly upload private data to the untrusted cloud.Therefore,realizing the word vector learning process over untrusted cloud servers without privacy leakage is an urgent and challenging task.In this paper,we present a novel efficient word vector learning scheme over encrypted data.We first design a series of arithmetic computation protocols.Then we use two non-colluding cloud servers to implement high-quality word vectors learning over encrypted data.The proposed scheme allows us to perform training word vectors on the remote cloud servers while protecting privacy.Security analysis and experiments over real data sets demonstrate that our scheme is more secure and efficient than existing privacy-preserving word vector learning schemes. 展开更多
关键词 Privacy-preserving word vector learning Secret sharing Internet of things
下载PDF
Cross-Lingual Non-Ferrous Metals Related News Recognition Method Based on CNN with A Limited Bi-Lingual Dictionary 被引量:2
6
作者 Xudong Hong Xiao Zheng +1 位作者 Jinyuan Xia Linna Wei 《Computers, Materials & Continua》 SCIE EI 2019年第2期379-389,共11页
To acquire non-ferrous metals related news from different countries’internet,we proposed a cross-lingual non-ferrous metals related news recognition method based on CNN with a limited bilingual dictionary.Firstly,con... To acquire non-ferrous metals related news from different countries’internet,we proposed a cross-lingual non-ferrous metals related news recognition method based on CNN with a limited bilingual dictionary.Firstly,considering the lack of related language resources of non-ferrous metals,we use a limited bilingual dictionary and CCA to learn cross-lingual word vector and to represent news in different languages uniformly.Then,to improve the effect of recognition,we use a variant of the CNN to learn recognition features and construct the recognition model.The experimental results show that our proposed method acquires better results. 展开更多
关键词 Non-ferrous metal CNN cross-lingual text classification word vector
下载PDF
Research on high-performance English translation based on topic model
7
作者 Yumin Shen Hongyu Guo 《Digital Communications and Networks》 SCIE CSCD 2023年第2期505-511,共7页
Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based... Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation. 展开更多
关键词 Machine translation Topic model Statistical machine translation Bilingual word vector RETELLING
下载PDF
An Optimized Chinese Filtering Model Using Value Scale Extended Text Vector
8
作者 Siyu Lu Ligao Cai +5 位作者 Zhixin Liu Shan Liu Bo Yang Lirong Yin Mingzhe Liu Wenfeng Zheng 《Computer Systems Science & Engineering》 SCIE EI 2023年第11期1881-1899,共19页
With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification... With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification has become a critical problem to be solved by text filtering,especially for Chinese texts.This paper selected the manually calibrated Douban movie website comment data for research.First,a text filtering model based on the BP neural network has been built;Second,based on the Term Frequency-Inverse Document Frequency(TF-IDF)vector space model and the doc2vec method,the text word frequency vector and the text semantic vector were obtained respectively,and the text word frequency vector was linearly reduced by the Principal Component Analysis(PCA)method.Third,the text word frequency vector after dimensionality reduction and the text semantic vector were combined,add the text value degree,and the text synthesis vector was constructed.Experiments show that the model combined with text word frequency vector degree after dimensionality reduction,text semantic vector,and text value has reached the highest accuracy of 84.67%. 展开更多
关键词 Chinese text filtering text vector word frequency vectors text semantic vectors value degree BP neural network TF-IDF doc2vec PCA
下载PDF
News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model 被引量:10
9
作者 Ao Xiong Derong Liu +3 位作者 Hongkang Tian Zhengyuan Liu Peng Yu Michel Kadoch 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第6期886-893,共8页
The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extra... The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms. 展开更多
关键词 keyword extraction TextR ank SEMANTICS word vector
原文传递
融合LDA与Word2vector的垃圾邮件过滤方法研究 被引量:1
10
作者 林建洪 翟建桐 徐菁 《网络安全技术与应用》 2017年第3期73-75,共3页
在传统垃圾邮件过滤技术的基础上,提出一种融合LDA主题模型和Word2vector模型的文档向量,并将LDA主题模型得到的不同维度的文档——主题矩阵、Word2vector模型得到的词向量以及融合的文档向量作为支持向量机和逻辑回归的特征输入,通过8... 在传统垃圾邮件过滤技术的基础上,提出一种融合LDA主题模型和Word2vector模型的文档向量,并将LDA主题模型得到的不同维度的文档——主题矩阵、Word2vector模型得到的词向量以及融合的文档向量作为支持向量机和逻辑回归的特征输入,通过8组对照实验的效果分析得到:融合的文档向量结合支持向量机模型的准确率最高,能够对垃圾邮件进行精准过滤,降低了垃圾邮件对个人以及社会的危害。 展开更多
关键词 LDA主题模型 word2vector 垃圾邮件 支持向量机
原文传递
WEDeepT3: predicting type Ⅲ secreted effectors based on word embedding and deep learning
11
作者 Xiaofeng Fu Yang Yang 《Quantitative Biology》 CAS CSCD 2019年第4期293-301,共9页
Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the ty... Background:The type Ⅲ secreted effectors(T3SEs)are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria.In particular,the pathogenesis of Gram-negative bacteria depends on the type Ⅲ secreted effectors,and by injecting T3SEs into a host cell,the host cell's immunity can be destroyed.The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict.Moreover,the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics.Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type Ⅲ secretion systems(T3SS).Although these tools can help biological experiments in certain procedures,there is still room for improvement,even for the current best model,as the existing methods adopt handdesigned feature and traditional machine learning methods.Methods:In this study,we propose a powerful predictor based on deep learning methods,called WEDeepT3.Our work consists mainly of three key steps.First,we train word embedding vectors for protein sequences in a large-scale amino acid sequence database.Second,we combine the word vectors with traditional features extracted from protein sequences,like PSSM,to construct a more comprehensive feature representation.Finally,we construct a deep neural network model in the prediction of type Ⅲ secreted effectors.Results:The feature representation of WEDeepT3 consists of both word embedding and position-specific features.Working together with convolutional neural networks,the new model achieves superior performance to the state-ofthe-art methods,demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.Conclusion:WEDeepT3 exploits both semantic information of Ar-mer fragments and evolutional information of protein sequences to accurately difYerentiate between T3SEs and non-T3SEs.WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html. 展开更多
关键词 typeⅢsecreted effectors word2vector PSSM feature representation
原文传递
Effective Vietnamese Sentiment Analysis Model Using Sentiment Word Embedding and Transfer Learning
12
作者 Yong Huang Siwei Liu +1 位作者 Liangdong Qu Yongsheng Li 《国际计算机前沿大会会议论文集》 2020年第2期36-46,共11页
Sentiment analysis is one of the most popular fields in NLP,and with the development of computer software and hardware,its application is increasingly extensive.Supervised corpus has a positive effect on model trainin... Sentiment analysis is one of the most popular fields in NLP,and with the development of computer software and hardware,its application is increasingly extensive.Supervised corpus has a positive effect on model training,but these corpus are prohibitively expensive to manually produce.This paper proposes a deep learning sentiment analysis model based on transfer learning.It represents the sentiment and semantics of words and improves the effect of Vietnamese sentiment analysis model by using English corpus.It generated semantic vectors through Word2Vec,an open-source tool,and built sentiment vectors through LSTM with attention mechanism to get sentiment word vector.With the method of sharing parameters,the model was pre-training with English corpus.Finally,the sentiment of the text was classified by stacked Bi-LSTM with attention mechanism,with input of sentiment word vector.Experiments show that the model can effectively improve the performance of Vietnamese sentiment analysis under small language materials. 展开更多
关键词 Sentiment analysis Long short-term memory Attention mechanism Sentiment word vector Transfer learning
原文传递
Cardinality Estimator:Processing SQL with a Vertical Scanning Convolutional Neural Network 被引量:5
13
作者 Shao-Jie Qiao Guo-Ping Yang +5 位作者 Nan Han Hao Chen Fa-Liang Huang Kun Yue Yu-Gen Yi Chang-An Yuan 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第4期762-777,共16页
Although the popular database systems perform well on query optimization,they still face poor query execution plans when the join operations across multiple tables are complex.Bad execution planning usually results in... Although the popular database systems perform well on query optimization,they still face poor query execution plans when the join operations across multiple tables are complex.Bad execution planning usually results in bad cardinality estimations.The cardinality estimation models in traditional databases cannot provide high-quality estimation,because they are not capable of capturing the correlation between multiple tables in an effective fashion.Recently,the state-of-the-art learning-based cardinality estimation is estimated to work better than the traditional empirical methods.Basically,they used deep neural networks to compute the relationships and correlations of tables.In this paper,we propose a vertical scanning convolutional neural network(abbreviated as VSCNN)to capture the relationships between words in the word vector in order to generate a feature map.The proposed learning-based cardinality estimator converts Structured Query Language(SQL)queries from a sentence to a word vector and we encode table names in the one-hot encoding method and the samples into bitmaps,separately,and then merge them to obtain enough semantic information from data samples.In particular,the feature map obtained by VSCNN contains semantic information including tables,joins,and predicates about SQL queries.Importantly,in order to improve the accuracy of cardinality estimation,we propose the negative sampling method for training the word vector by gradient descent from the base table and compress it into a bitmap.Extensive experiments are conducted and the results show that the estimation quality of q-error of the proposed vertical scanning convolutional neural network based model is reduced by at least 14.6%when compared with the estimators in traditional databases. 展开更多
关键词 cardinality estimation word vector vertical scanning convolutional neural network sampling method
原文传递
Online Latent Dirichlet Allocation Model Based on Sentiment Polarity Time Series
14
作者 HUANG Bo JU Jiaji +3 位作者 CHEN Huan ZHU Yimin LIU Jin SHI Zhicai 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2021年第6期464-472,共9页
The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent... The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent Dirichlet Allocation(LDA)to obtain the distribution of topic words in the current time window.Second,the word2 vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic.Finally,the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors,and the distribution of topic words in the next time window is controlled through them.The experimental results show that the PSOLDA model decreases the probability distribution by 0.1601,while Online Twitter LDA only increases by 0.0699.The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model. 展开更多
关键词 topic evolution sentiment factors word vector Latent Dirichlet Allocation(LDA)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部