期刊文献+
共找到2,147篇文章
< 1 2 108 >
每页显示 20 50 100
基于Topic Model的我国档案学主题结构与演化研究 被引量:4
1
作者 董克 韩宇姝 《信息资源管理学报》 CSSCI 2017年第3期97-105,共9页
文本内容分析能够有效揭示学科研究的主题结构与知识的发展过程。本文运用主题模型与时间序列分析等方法,以档案学领域的两种CSSCI源刊近10年刊载的论文为分析对象进行文本内容挖掘。分析结果表明,上述方法的结合能够有效识别学科领域... 文本内容分析能够有效揭示学科研究的主题结构与知识的发展过程。本文运用主题模型与时间序列分析等方法,以档案学领域的两种CSSCI源刊近10年刊载的论文为分析对象进行文本内容挖掘。分析结果表明,上述方法的结合能够有效识别学科领域研究的主题,并揭示学科主题的发展过程;中国档案学领域近10年的研究主要集中在学科范式研究、电子文件管理、档案信息服务等12个研究主题;通过对不同主题的时间分布分析,揭示了这些主题的演化过程,进一步归纳总结了相关方法使用的主要注意事项并给出了对应建议。 展开更多
关键词 主题模型 学科结构 主题结构 主题演化档案学研究
下载PDF
基于BERTopic模型的国外信息资源管理研究进展分析 被引量:2
2
作者 杨思洛 吴丽娟 《情报理论与实践》 CSSCI 北大核心 2024年第2期189-197,共9页
[目的/意义]文章从已有研究成果中提取主题,梳理主要研究方向,展示主题热度变化趋势,为了解和评估国外信息资源管理(IRM)研究发展现状与趋势提供参考。[方法/过程]采用新兴BERTopic模型对2013—2022年期间WoS数据库中IRM相关文献进行主... [目的/意义]文章从已有研究成果中提取主题,梳理主要研究方向,展示主题热度变化趋势,为了解和评估国外信息资源管理(IRM)研究发展现状与趋势提供参考。[方法/过程]采用新兴BERTopic模型对2013—2022年期间WoS数据库中IRM相关文献进行主题提取与识别,结合相关主题词及主题距离划分研究方向,并利用动态主题模型揭示国外IRM领域的演变过程。[结果/结论]国外IRM近10年的研究可分为59个主题,可归纳为信息技术及应用、企业信息管理、图书馆管理与服务、健康信息管理、信息用户与服务、IRM基本理论与方法、文献计量与评价7个方向。大多数主题的研究热度变化偏向稳定的趋势,数字化建设、开放数据等部分主题热度逐渐上涨,而外包、知识管理等少数主题热度退却。 展开更多
关键词 国外信息资源管理 主题模型 BERtopic 研究主题 研究进展
下载PDF
Mining User Interest in Microblogs with a User-Topic Model 被引量:17
3
作者 HE Li JIA Yan +1 位作者 HAN Weihong DING Zhaoyun 《China Communications》 SCIE CSCD 2014年第8期131-144,共14页
Microblogs have become an important platform for people to publish,transform information and acquire knowledge.This paper focuses on the problem of discovering user interest in microblogs.In this paper,we propose a to... Microblogs have become an important platform for people to publish,transform information and acquire knowledge.This paper focuses on the problem of discovering user interest in microblogs.In this paper,we propose a topic mining model based on Latent Dirichlet Allocation(LDA) named user-topic model.For each user,the interests are divided into two parts by different ways to generate the microblogs:original interest and retweet interest.We represent a Gibbs sampling implementation for inference the parameters of our model,and discover not only user's original interest,but also retweet interest.Then we combine original interest and retweet interest to compute interest words for users.Experiments on a dataset of Sina microblogs demonstrate that our model is able to discover user interest effectively and outperforms existing topic models in this task.And we find that original interest and retweet interest are similar and the topics of interest contain user labels.The interest words discovered by our model reflect user labels,but range is much broader. 展开更多
关键词 MICROBLOGS topic mining userinterest LDA user-topic model
下载PDF
基于有监督Topic Model的图像分类 被引量:1
4
作者 付勋 宋俊德 《软件》 2013年第12期253-255,共3页
近年来,以LDA为代表的话题模型在图像和文本处理中均得到了广泛的应用。与传统的机器学习方法相比,LDA模型具有参数少,表达能力强等优点,同时作为一种生成模型,它可以有效模拟人类学习的方式,便利地加入先验知识。有监督的LDA模型则将... 近年来,以LDA为代表的话题模型在图像和文本处理中均得到了广泛的应用。与传统的机器学习方法相比,LDA模型具有参数少,表达能力强等优点,同时作为一种生成模型,它可以有效模拟人类学习的方式,便利地加入先验知识。有监督的LDA模型则将生成模型与判别模型结合在一起,是一种通用的分类方法。Dense-SIFT特征被作为底层特征,在词袋模型的框架下,以k-means算法构建词典,用有监督的LDA模型训练,并在通用的图像数据集上进行评测,根据评测结果证明其在图像分类任务中具有很好的性能。 展开更多
关键词 图像分类 话题模型 有监督模型词
下载PDF
BURST-LDA: A NEW TOPIC MODEL FOR DETECTING BURSTY TOPICS FROM STREAM TEXT 被引量:3
5
作者 Qi Xiang Huang Yu +4 位作者 Chen Ziyan Liu Xiaoyan Tian Jing Huang Tinglei Wang Hongqi 《Journal of Electronics(China)》 2014年第6期565-575,共11页
Topic models such as Latent Dirichlet Allocation(LDA) have been successfully applied to many text mining tasks for extracting topics embedded in corpora. However, existing topic models generally cannot discover bursty... Topic models such as Latent Dirichlet Allocation(LDA) have been successfully applied to many text mining tasks for extracting topics embedded in corpora. However, existing topic models generally cannot discover bursty topics that experience a sudden increase during a period of time. In this paper, we propose a new topic model named Burst-LDA, which simultaneously discovers topics and reveals their burstiness through explicitly modeling each topic's burst states with a first order Markov chain and using the chain to generate the topic proportion of documents in a Logistic Normal fashion. A Gibbs sampling algorithm is developed for the posterior inference of the proposed model. Experimental results on a news data set show our model can efficiently discover bursty topics, outperforming the state-of-the-art method. 展开更多
关键词 Text mining Burst detection topic model Graphical model Bayesian inference
下载PDF
Anomaly detection in traffic surveillance with sparse topic model 被引量:4
6
作者 XIA Li-min HU Xiang-jie WANG Jun 《Journal of Central South University》 SCIE EI CAS CSCD 2018年第9期2245-2257,共13页
Most research on anomaly detection has focused on event that is different from its spatial-temporal neighboring events.It is still a significant challenge to detect anomalies that involve multiple normal events intera... Most research on anomaly detection has focused on event that is different from its spatial-temporal neighboring events.It is still a significant challenge to detect anomalies that involve multiple normal events interacting in an unusual pattern.In this work,a novel unsupervised method based on sparse topic model was proposed to capture motion patterns and detect anomalies in traffic surveillance.scale-invariant feature transform(SIFT)flow was used to improve the dense trajectory in order to extract interest points and the corresponding descriptors with less interference.For the purpose of strengthening the relationship of interest points on the same trajectory,the fisher kernel method was applied to obtain the representation of trajectory which was quantized into visual word.Then the sparse topic model was proposed to explore the latent motion patterns and achieve a sparse representation for the video scene.Finally,two anomaly detection algorithms were compared based on video clip detection and visual word analysis respectively.Experiments were conducted on QMUL Junction dataset and AVSS dataset.The results demonstrated the superior efficiency of the proposed method. 展开更多
关键词 motion pattern sparse topic model SIFT flow dense trajectory fisher kernel
下载PDF
Enhancing Collaborative Filtering via Topic Model Integrated Uniform Euclidean Distance 被引量:1
7
作者 Tieliang Gao Bo Cheng +1 位作者 Junliang Chen Ming Chen 《China Communications》 SCIE CSCD 2017年第11期48-58,共11页
Recommendation system can greatly alleviate the "information overload" in the big data era. Existing recommendation methods, however, typically focus on predicting missing rating values via analyzing user-it... Recommendation system can greatly alleviate the "information overload" in the big data era. Existing recommendation methods, however, typically focus on predicting missing rating values via analyzing user-item dualistic relationship, which neglect an important fact that the latent interests of users can influence their rating behaviors. Moreover, traditional recommendation methods easily suffer from the high dimensional problem and cold-start problem. To address these challenges, in this paper, we propose a PBUED(PLSA-Based Uniform Euclidean Distance) scheme, which utilizes topic model and uniform Euclidean distance to recommend the suitable items for users. The solution first employs probabilistic latent semantic analysis(PLSA) to extract users' interests, users with different interests are divided into different subgroups. Then, the uniform Euclidean distance is adopted to compute the users' similarity in the same interest subset; finally, the missing rating values of data are predicted via aggregating similar neighbors' ratings. We evaluate PBUED on two datasets and experimental results show PBUED can lead to better predicting performance and ranking performance than other approaches. 展开更多
关键词 recommendation system topic model user interest uniform euclidean distance
下载PDF
Assessing citizen science opportunities in forest monitoring using probabilistic topic modelling 被引量:1
8
作者 Stefan Daume Matthias Albert Klaus von Gadow 《Forestry Studies in China》 CAS 2014年第2期93-104,共12页
Background: With mounting global environmental, social and economic pressures the resilience and stability of forests and thus the provisioning of vital ecosystem services is increasingly threatened. Intensified moni... Background: With mounting global environmental, social and economic pressures the resilience and stability of forests and thus the provisioning of vital ecosystem services is increasingly threatened. Intensified monitoring can help to detect ecological threats and changes earlier, but monitoring resources are limited. Participatory forest monitoring with the help of "citizen scientists" can provide additional resources for forest monitoring and at the same time help to communicate with stakeholders and the general public. Examples for citizen science projects in the forestry domain can be found but a solid, applicable larger framework to utilise public participation in the area of forest monitoring seems to be lacking. We propose that a better understanding of shared and related topics in citizen science and forest monitoring might be a first step towards such a framework. Methods: We conduct a systematic meta-analysis of 1015 publication abstracts addressing "forest monitoring" and "citizen science" in order to explore the combined topical landscape of these subjects. We employ 'topic modelling an unsupervised probabilistic machine learning method, to identify latent shared topics in the analysed publications. Results: We find that large shared topics exist, but that these are primarily topics that would be expected in scientific publications in general. Common domain-specific topics are under-represented and indicate a topical separation of the two document sets on "forest monitoring" and "citizen science" and thus the represented domains. While topic modelling as a method proves to be a scalable and useful analytical tool, we propose that our approach could deliver even more useful data if a larger document set and full-text publications would be available for analysis. Conclusions: We propose that these results, together with the observation of non-shared but related topics, point at under-utilised opportunities for public participation in forest monitoring. Citizen science could be applied as a versatile tool in forest ecosystems monitoring, complementing traditional forest monitoring programmes, assisting early threat recognition and helping to connect forest management with the general public. We conclude that our presented approach should be pursued further as it may aid the understanding and setup of citizen science efforts in the forest monitoring domain. 展开更多
关键词 Forest monitoring Citizen science Participatory forest monitoring Probabilistic topic modelling Text analysis
下载PDF
TG-SMR:AText Summarization Algorithm Based on Topic and Graph Models 被引量:1
9
作者 Mohamed Ali Rakrouki Nawaf Alharbe +1 位作者 Mashael Khayyat Abeer Aljohani 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期395-408,共14页
Recently,automation is considered vital in most fields since computing methods have a significant role in facilitating work such as automatic text summarization.However,most of the computing methods that are used in r... Recently,automation is considered vital in most fields since computing methods have a significant role in facilitating work such as automatic text summarization.However,most of the computing methods that are used in real systems are based on graph models,which are characterized by their simplicity and stability.Thus,this paper proposes an improved extractive text summarization algorithm based on both topic and graph models.The methodology of this work consists of two stages.First,the well-known TextRank algorithm is analyzed and its shortcomings are investigated.Then,an improved method is proposed with a new computational model of sentence weights.The experimental results were carried out on standard DUC2004 and DUC2006 datasets and compared to four text summarization methods.Finally,through experiments on the DUC2004 and DUC2006 datasets,our proposed improved graph model algorithm TG-SMR(Topic Graph-Summarizer)is compared to other text summarization systems.The experimental results prove that the proposed TG-SMR algorithm achieves higher ROUGE scores.It is foreseen that the TG-SMR algorithm will open a new horizon that concerns the performance of ROUGE evaluation indicators. 展开更多
关键词 Natural language processing text summarization graph model topic model
下载PDF
基于iTopicModel的关联文本分类算法
10
作者 梁鹏鹏 柴玉梅 王黎明 《计算机工程》 CAS CSCD 北大核心 2011年第21期124-125,130,共3页
针对传统文本分类方法对文档间关联关系考虑不充分的问题,提出一种基于iTopicModel的关联文本分类算法。根据类信息已知的文档归属于各个主题的概率判断主题代表的类信息,利用待分类文档归属于各个主题的概率及文本信息对文档进行分类... 针对传统文本分类方法对文档间关联关系考虑不充分的问题,提出一种基于iTopicModel的关联文本分类算法。根据类信息已知的文档归属于各个主题的概率判断主题代表的类信息,利用待分类文档归属于各个主题的概率及文本信息对文档进行分类。实验结果表明,当文档间的关联关系对类信息影响较大时,TC-iTM的分类性能优于传统文本分类方法。 展开更多
关键词 文本分类 文档网络 主题模型 EM算法
下载PDF
Self-Adaptive Topic Model: A Solution to the Problem of "Rich Topics Get Richer" 被引量:1
11
作者 FANG Ying 《China Communications》 SCIE CSCD 2014年第12期35-43,共9页
The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet... The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible. 展开更多
关键词 topic model infinite Latent Dirichlet Allocation Dirichlet process topic evolution
下载PDF
基于BERTopic模型的网络暴力事件衍生舆情探测 被引量:2
12
作者 胡凯茜 李欣 王龙腾 《情报杂志》 CSSCI 北大核心 2024年第7期146-153,共8页
[研究目的]在海量用户生成内容中及时探测和剖析网络暴力事件的衍生舆情能够为舆情事件链的演化分析、同类舆情的研判介入、衍生事件的监测预警提供理论支持。[研究方法]使用BERTopic模型对短文本内容主题建模并采用聚类的方式展示主题... [研究目的]在海量用户生成内容中及时探测和剖析网络暴力事件的衍生舆情能够为舆情事件链的演化分析、同类舆情的研判介入、衍生事件的监测预警提供理论支持。[研究方法]使用BERTopic模型对短文本内容主题建模并采用聚类的方式展示主题的潜在层次结构。根据词向量余弦相似度设计主题衍生度的计量算法,同时融合词共现网络在文档-词语层面信息捕捉的优势以及桑基图直观演示舆情演化过程的特点,衡量主题间的影响力与衍生关系。[研究结论]在开源数据集下多组主题模型的对照实验中,BERTopic模型在短文本建模以及下游任务的平均得分提高2.13%。在网络暴力热点事件的应用实例中,多维细粒度分析与交互式可视化方法可达到直观展示暴力事件的主题聚类、词义关联与演化态势的效果,实现网络暴力事件衍生舆情的探测与分析。 展开更多
关键词 网络舆情 网络暴力 衍生舆情 舆情监测 短文本 主题建模 BERtopic模型
下载PDF
A Semi-Supervised Topic Model Incorporating Sentiment and Dynamic Characteristic
13
作者 Lanshan Zhang Xi Ding +2 位作者 Ye Tian Xiangyang Gong Wendong Wang 《China Communications》 SCIE CSCD 2016年第12期162-175,共14页
With the rapid popularization of social applications, various kinds of social media have developed into an important platform for publishing information and expressing opinion. Detecting hidden topics from the huge am... With the rapid popularization of social applications, various kinds of social media have developed into an important platform for publishing information and expressing opinion. Detecting hidden topics from the huge amount of user-generated contents is of great commerce value and social significance. However traditional text analysis approachesonly focus on the statistical correlation between words, but ignore the sentiment tendency and the temporal properties which may have great effects on topic detection results. This paper proposed a Dynamic Sentiment-Topic(DST) model which can not only detect and track the dynamic topics but also analyze the shift of public's sentiment tendency towards certain topic.Expectation-Maximization algorithm was used in DST model to estimate the latent distribution, and we used Gibbs sampling method to sample new document set and update the hyper parameters and distributions.Experiments are conducted on a real dataset and the results show that DST model outperforms the existing algorithms in terms of topic detection and sentiment accuracy. 展开更多
关键词 dynamic sentiment-topic model sentiment analysis topic detection
下载PDF
NON-PARAMETRIC TOPIC MODEL FOR DISCOVERING GEOGRAPHICAL TOPIC VARIATIONS
14
作者 Qi Xiang Huang Yu +3 位作者 Song Jun Huang Tinglei Wang Hongqi Fu Kun 《Journal of Electronics(China)》 2014年第6期576-586,共11页
This paper presents a non-parametric topic model that captures not only the latent topics in text collections, but also how the topics change over space. Unlike other recent work that relies on either Gaussian assumpt... This paper presents a non-parametric topic model that captures not only the latent topics in text collections, but also how the topics change over space. Unlike other recent work that relies on either Gaussian assumptions or discretization of locations, here topics are associated with a distance dependent Chinese Restaurant Process(ddC RP), and for each document, the observed words are influenced by the document's GPS-tag. Our model allows both unbound number and flexible distribution of the geographical variations of the topics' content. We develop a Gibbs sampler for the proposal, and compare it with existing models on a real data set basis. 展开更多
关键词 Text mining topic model Geographical topics Bayesian non-parameter
下载PDF
A Phrase Topic Model Based on Distributed Representation
15
作者 Jialin Ma Jieyi Cheng +2 位作者 Lin Zhang Lei Zhou Bolun Chen 《Computers, Materials & Continua》 SCIE EI 2020年第7期455-469,共15页
Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domai... Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domain experts are possible to guess their meaning.In fact,phrases are the main unit for people to express semantics.This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation(DR-Phrase LDA)which is a phrase topic model.Specifically,we reasonably enhance the semantic information of phrases via distributed representation in this model.The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models. 展开更多
关键词 PHRASE topic model LDA distributed representation Gibbs sampling
下载PDF
Probit Normal Correlated Topic Model
16
作者 Xingchen Yu Ernest Fokoué 《Open Journal of Statistics》 2014年第11期879-888,共10页
The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this... The logistic normal distribution has recently been adapted via the transformation of multivariate Gaussian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far concentrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modeling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our approach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well-known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents. 展开更多
关键词 topic model Bayesian Gibbs SAMPLER Cumulative Distribution Function PROBIT LOGIT DIAGONAL Orthant Efficient Sampling Auxiliary Variable Correlation Structure topic Vocabulary Conjugate DIRICHLET Gaussian
下载PDF
Topic Modelling and Sentimental Analysis of Students’Reviews
17
作者 Omer S.Alkhnbashi Rasheed Mohammad Nassr 《Computers, Materials & Continua》 SCIE EI 2023年第3期6835-6848,共14页
Globally,educational institutions have reported a dramatic shift to online learning in an effort to contain the COVID-19 pandemic.The fundamental concern has been the continuance of education.As a result,several novel... Globally,educational institutions have reported a dramatic shift to online learning in an effort to contain the COVID-19 pandemic.The fundamental concern has been the continuance of education.As a result,several novel solutions have been developed to address technical and pedagogical issues.However,these were not the only difficulties that students faced.The implemented solutions involved the operation of the educational process with less regard for students’changing circumstances,which obliged them to study from home.Students should be asked to provide a full list of their concerns.As a result,student reflections,including those from Saudi Arabia,have been analysed to identify obstacles encountered during the COVID-19 pandemic.However,most of the analyses relied on closed-ended questions,which limited student involvement.To delve into students’responses,this study used open-ended questions,a qualitative method(content analysis),a quantitative method(topic modelling),and a sentimental analysis.This study also looked at students’emotional states during and after the COVID-19 pandemic.In terms of determining trends in students’input,the results showed that quantitative and qualitative methods produced similar outcomes.Students had unfavourable sentiments about studying during COVID-19 and positive sentiments about the face-to-face study.Furthermore,topic modelling has revealed that the majority of difficulties are more related to the environment(home)and social life.Students were less accepting of online learning.As a result,it is possible to conclude that face-to-face study still attracts students and provides benefits that online study cannot,such as social interaction and effective eye-to-eye communication. 展开更多
关键词 topic modelling sentimental analysis COVID-19 students’input
下载PDF
A Structural Topic Model for Exploring User Satisfaction with Mobile Payments
18
作者 Jang Hyun Kim Jisung Jang +1 位作者 Yonghwan Kim Dongyan Nan 《Computers, Materials & Continua》 SCIE EI 2022年第11期3815-3826,共12页
This study explored user satisfaction with mobile payments by applying a novel structural topic model.Specifically,we collected 17,927 online reviews of a specific mobile payment(i.e.,PayPal).Then,we employed a struct... This study explored user satisfaction with mobile payments by applying a novel structural topic model.Specifically,we collected 17,927 online reviews of a specific mobile payment(i.e.,PayPal).Then,we employed a structural topic model to investigate the relationship between the attributes extracted from online reviews and user satisfaction with mobile payment.Consequently,we discovered that“lack of reliability”and“poor customer service”tend to appear in negative reviews.Whereas,the terms“convenience,”“user-friendly interface,”“simple process,”and“secure system”tend to appear in positive reviews.On the basis of information system success theory,we categorized the topics“convenience,”“user-friendly interface,”and“simple process,”as system quality.In addition,“poor customer service”was categorized as service quality.Furthermore,based on the previous studies of trust and security,“lack of reliability”and“secure system”were categorized as trust and security,respectively.These outcomes indicate that users are satisfied when they perceive that system quality and security of specific mobile payments are great.On the contrary,users are dissatisfied when they feel that service quality and reliability of specific mobile payments is lacking.Overall,our research implies that a novel structural topic model is an effective method to explore mobile payment user experience. 展开更多
关键词 Mobile payment user satisfaction online review structural topic model
下载PDF
Research on high-performance English translation based on topic model
19
作者 Yumin Shen Hongyu Guo 《Digital Communications and Networks》 SCIE CSCD 2023年第2期505-511,共7页
Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based... Retelling extraction is an important branch of Natural Language Processing(NLP),and high-quality retelling resources are very helpful to improve the performance of machine translation.However,traditional methods based on the bilingual parallel corpus often ignore the document background in the process of retelling acquisition and application.In order to solve this problem,we introduce topic model information into the translation mode and propose a topic-based statistical machine translation method to improve the translation performance.In this method,Probabilistic Latent Semantic Analysis(PLSA)is used to obtains the co-occurrence relationship between words and documents by the hybrid matrix decomposition.Then we design a decoder to simplify the decoding process.Experiments show that the proposed method can effectively improve the accuracy of translation. 展开更多
关键词 Machine translation topic model Statistical machine translation Bilingual word vector RETELLING
下载PDF
Identification of Topics from Scientific Papers through Topic Modeling
20
作者 Denis Luiz Marcello Owa 《Open Journal of Applied Sciences》 2021年第4期541-548,共8页
Topic modeling is a probabilistic model that identifies topics covered in text(s). In this paper, topics were loaded from two implementations of topic modeling, namely, Latent Semantic Indexing (LSI) and Latent Dirich... Topic modeling is a probabilistic model that identifies topics covered in text(s). In this paper, topics were loaded from two implementations of topic modeling, namely, Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). This analysis was performed in a corpus of 1000 academic papers written in English, obtained from PLOS ONE website, in the areas of Biology, Medicine, Physics and Social Sciences. The objective is to verify if the four academic fields were represented in the four topics obtained by topic modeling. The four topics obtained from Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) did not represent the four academic fields. 展开更多
关键词 topic modeling Corpus Linguistics Gensim LSI LDA
下载PDF
上一页 1 2 108 下一页 到第
使用帮助 返回顶部