期刊文献+
共找到18篇文章
< 1 >
每页显示 20 50 100
Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA) 被引量:2
1
作者 秦永彬 李解 +1 位作者 黄瑞章 李晶 《Journal of Donghua University(English Edition)》 EI CAS 2016年第5期685-688,共4页
To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.... To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective. 展开更多
关键词 latent dirichlet allocation(LDA) semi-supervised learning document clustering
下载PDF
Fuzzy Based Latent Dirichlet Allocation for Intrusion Detection in Cloud Using ML
2
作者 S.Ranjithkumar S.Chenthur Pandian 《Computers, Materials & Continua》 SCIE EI 2022年第3期4261-4277,共17页
The growth of cloud in modern technology is drastic by provisioning services to various industries where data security is considered to be common issue that influences the intrusion detection system(IDS).IDS are consi... The growth of cloud in modern technology is drastic by provisioning services to various industries where data security is considered to be common issue that influences the intrusion detection system(IDS).IDS are considered as an essential factor to fulfill security requirements.Recently,there are diverse Machine Learning(ML)approaches that are used for modeling effectual IDS.Most IDS are based on ML techniques and categorized as supervised and unsupervised.However,IDS with supervised learning is based on labeled data.This is considered as a common drawback and it fails to identify the attack patterns.Similarly,unsupervised learning fails to provide satisfactory outcomes.Therefore,this work concentrates on semi-supervised learning model known as Fuzzy based semi-supervised approach through Latent Dirichlet Allocation(F-LDA)for intrusion detection in cloud system.This helps to resolve the aforementioned challenges.Initially,LDA gives better generalization ability for training the labeled data.Similarly,to handle the unlabelled data,Fuzzy model has been adopted for analyzing the dataset.Here,preprocessing has been carried out to eliminate data redundancy over network dataset.In order to validate the efficiency of F-LDA towards ID,this model is tested under NSL-KDD cup dataset is a common traffic dataset.Simulation is done inMATLAB environment and gives better accuracy while comparing with benchmark standard dataset.The proposed F-LDAgives better accuracy and promising outcomes than the prevailing approaches. 展开更多
关键词 Cloud security fuzzy model latent dirichlet allocation PREPROCESSING NSL-KDD
下载PDF
PG-CODE:Latent Dirichlet Allocation Embedded Policy Knowledge Graph for Government Department Coordination 被引量:3
3
作者 Yilin Kang Renwei Ou +2 位作者 Yi Zhang Hongling Li Shasha Tian 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期680-691,共12页
Government policy-group integration and policy-chain inference are significant to the execution of strategies in current Chinese society.Specifically,the coordination of hierarchical policies implemented among governm... Government policy-group integration and policy-chain inference are significant to the execution of strategies in current Chinese society.Specifically,the coordination of hierarchical policies implemented among government departments is one of the key challenges to rural revitalization.In recent years,various well-established quantitative methods have been proposed to evaluate policy coordination,but the majority of these relied on manual analysis,which can lead to subjective results.Thus,in this paper,a novel approach called“policy knowledge graph for the coordination among the government departments”(PG-CODE)is proposed,which incorporates topic modeling into policy knowledge graphs.Similar to a knowledge graph,a policy knowledge graph uses a graph-structured data model to integrate policy discourse.With latent Dirichlet allocation embedding,a policy knowledge graph could capture the underlying topics of the policies.Furthermore,coordination strength and topic diffusion among hierarchical departments could be inferred from the PG-CODE,as it can provide a better representation of coordination within the policy space.We implemented and evaluated the PG-CODE in the field of rural innovation and entrepreneurship policy,and the results effectively demonstrate improved coordination among departments. 展开更多
关键词 policy knowledge graph department coordination topic diffusion latent dirichlet allocation rural revitalization
原文传递
Online Latent Dirichlet Allocation Model Based on Sentiment Polarity Time Series
4
作者 HUANG Bo JU Jiaji +3 位作者 CHEN Huan ZHU Yimin LIU Jin SHI Zhicai 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2021年第6期464-472,共9页
The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent... The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent Dirichlet Allocation(LDA)to obtain the distribution of topic words in the current time window.Second,the word2 vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic.Finally,the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors,and the distribution of topic words in the next time window is controlled through them.The experimental results show that the PSOLDA model decreases the probability distribution by 0.1601,while Online Twitter LDA only increases by 0.0699.The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model. 展开更多
关键词 topic evolution sentiment factors word vector latent dirichlet allocation(LDA)
原文传递
Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy
5
作者 黄涛 赵素云 +1 位作者 陈红 刘艺璇 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1382-1397,共16页
Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the... Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of privacy leakage.Specifically,word count statistics and updates of latent topics in CGS,which are essential for parameter estimation,could be employed by adversaries to conduct effective membership inference attacks(MIAs).Till now,there are two kinds of methods exploited in CGS to defend against MIAs:adding noise to word count statistics and utilizing inherent privacy.These two kinds of methods have their respective limitations.Noise sampled from the Laplacian distribution sometimes produces negative word count statistics,which render terrible parameter estimation in CGS.Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs.It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy.The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived.It is the first time that R′enyi differential privacy(RDP)has been introduced into CGS and we propose RDP-LDA,an effective framework for analyzing the privacy loss of any differentially private CGS.RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained byε-DP.In RDP-LDA,we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative.And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy.Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs. 展开更多
关键词 latent dirichlet allocation parameter estimation membership inference attack Rényi differential privacy
原文传递
Analysis of Public Sentiment regarding COVID-19 Vaccines on the Social Media Platform Reddit
6
作者 Lucien Dikla Ngueleo Jules Pagna Disso +2 位作者 Armel Ayimdji Tekemetieu Justin Moskolaï Ngossaha Michael Nana Kameni 《Journal of Computer and Communications》 2024年第2期80-108,共29页
This study undertakes a thorough analysis of the sentiment within the r/Corona-virus subreddit community regarding COVID-19 vaccines on Reddit. We meticulously collected and processed 34,768 comments, spanning from No... This study undertakes a thorough analysis of the sentiment within the r/Corona-virus subreddit community regarding COVID-19 vaccines on Reddit. We meticulously collected and processed 34,768 comments, spanning from November 20, 2020, to January 17, 2021, using sentiment calculation methods such as TextBlob and Twitter-RoBERTa-Base-sentiment to categorize comments into positive, negative, or neutral sentiments. The methodology involved the use of Count Vectorizer as a vectorization technique and the implementation of advanced ensemble algorithms like XGBoost and Random Forest, achieving an accuracy of approximately 80%. Furthermore, through the Dirichlet latent allocation, we identified 23 distinct reasons for vaccine distrust among negative comments. These findings are crucial for understanding the community’s attitudes towards vaccination and can guide targeted public health messaging. Our study not only provides insights into public opinion during a critical health crisis, but also demonstrates the effectiveness of combining natural language processing tools and ensemble algorithms in sentiment analysis. 展开更多
关键词 COVID-19 Vaccine TextBlob Twitter-RoBERTa-Base-Sentiment Sentiment Analysis latent dirichlet allocation
下载PDF
基于LDA的地铁施工安全隐患排查要点挖掘与可视化研究 被引量:3
7
作者 潘杏 钟波涛 +1 位作者 黑永健 骆汉宾 《土木建筑工程信息技术》 2021年第2期7-14,共8页
随着地铁的快速建设和隐患排查系统的建立,系统中积累了大量隐患排查记录,但是隐患排查记录信息冗杂,相关工作严重依赖导则与专家经验,需要投入大量人力成本。为提高隐患排查工作效率和安全管理决策,同时促进排查工作实现全程自动化,本... 随着地铁的快速建设和隐患排查系统的建立,系统中积累了大量隐患排查记录,但是隐患排查记录信息冗杂,相关工作严重依赖导则与专家经验,需要投入大量人力成本。为提高隐患排查工作效率和安全管理决策,同时促进排查工作实现全程自动化,本文提出了一种基于文本挖掘与可视化技术的自动化分析隐患排查文本框架,该框架主要包括以下四个步骤:第一,基于Term Frequency-Inverse Document Frequency(TF-IDF)算法,对隐患描述下的关键词有一个整体的概括;第二,基于TF-IDF筛出特征值较高的关键词,借助吉布斯抽样的Latent Dirichlet Allocation(LDA)模型识别出大规模隐患描述语料库中潜藏的主题信息和隐患排查要点;第三,结合时间维度,通过Word Cloud(WC)技术对隐患描述进行可视化分析,绘制隐患词云演化图;第四,借助Word Co-occurrence Network(WCN)模型,挖掘隐患共现关系。该框架在分析武汉地铁2016-2018年施工安全隐患排查记录中得到了应用和验证。实验结果表明,该框架有效挖掘出34类隐患所对应的隐患排查要点和可视化信息。 展开更多
关键词 安全管理 地铁施工安全 隐患排查要点 文本挖掘 latent dirichlet allocation模型 数据可视化
下载PDF
News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark 被引量:7
8
作者 Zhuo Zhou Jiaohua Qin +3 位作者 Xuyu Xiang Yun Tan Qiang Liu Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2020年第1期217-231,共15页
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm... Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity. 展开更多
关键词 News text topic clustering spark platform countvectorizer algorithm TF-IDF algorithm latent dirichlet allocation model
下载PDF
基于LDA模型的电力投诉文本热点话题识别 被引量:3
9
作者 许睿 龙丹 +1 位作者 刘佳 刘畅 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2020年第S02期26-31,共6页
电力客户投诉是评价电网公司客户满意度的核心指标.传统人工分析方法存在发现热点话题上存在低效率、实时性不高等问题,提出一种基于LDA(Latent Dirichlet Allocation)模型的电力投诉文本热点话题的识别方法.首先,使用TF-IDF方法从电力... 电力客户投诉是评价电网公司客户满意度的核心指标.传统人工分析方法存在发现热点话题上存在低效率、实时性不高等问题,提出一种基于LDA(Latent Dirichlet Allocation)模型的电力投诉文本热点话题的识别方法.首先,使用TF-IDF方法从电力投诉文本中提取TOP-N关键词,作为该文本的特征词集合,并采用词袋模型,将文本向量表示;其次,使用LDA模型提取文本的话题,得到"文本-话题"矩阵和"话题-单词"矩阵;然后,根据关键词在话题-单词矩阵中出现的分布概率以及关键词在文本中出现的频率,选取从最大权重值的关键词,作为该话题的特征词;最后,使用文档话题支持度,从提取的话题中识别出热点话题.实验结果表明该方法可以准确识别电力投诉文本中的热点话题. 展开更多
关键词 话题识别 LDA(latent dirichlet allocation)模型 TF-IDF 电力投诉文本
下载PDF
用于LDA的无监督特征选择(英文) 被引量:1
10
作者 徐蔚然 杜刚 +2 位作者 陈光 郭军 杨洁 《China Communications》 SCIE CSCD 2011年第5期54-62,共9页
As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability throug... As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between "general word" and "special word" in LDA topics.Therefore,we add a constraint to the LDA objective function to let the "general words" only happen in "general topics" other than "special topics".Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human. 展开更多
关键词 pattern recognition unsupervised feature selection latent dirichlet allocation general topic special topic
下载PDF
基于微生物分类的信息推荐模型
11
作者 李威耀 范国梅 马俊才 《计算机应用研究》 CSCD 北大核心 2020年第S01期211-212,210,共3页
信息技术的快速发展使得互联网上每天都会产生亿万条信息,面对如此海量的信息,单纯地依靠人力去筛选有效信息已经不现实,于是,能够快速筛选有效信息的推荐算法应运而生并得到了快速发展。为了解决实际工作中快速对获取的微生物信息进行... 信息技术的快速发展使得互联网上每天都会产生亿万条信息,面对如此海量的信息,单纯地依靠人力去筛选有效信息已经不现实,于是,能够快速筛选有效信息的推荐算法应运而生并得到了快速发展。为了解决实际工作中快速对获取的微生物信息进行分类的需求,使用自然语言处理技术,通过用户访问日志获取历史信息,使用隐含狄利克雷分布(LDA)算法从被访问信息中抽取主题模型,对所获得的微生物信息进行分类,然后使用协同推荐算法,建立基于微生物分类的信息推荐模型,通过这个模型,可以快速将信息分类。实验结果具有较高的准确率,能够为用户提供个性化服务。 展开更多
关键词 推荐算法 访问日志 latent dirichlet allocation(LDA) 主题模型 协同推荐
下载PDF
A Multi-Classifier Based Prediction Model for Phishing Emails Detection Using Topic Modelling, Named Entity Recognition and Image Processing
12
作者 C. Emilin Shyni S. Sarju S. Swamynathan 《Circuits and Systems》 2016年第9期2507-2520,共14页
Phishing is the act of attempting to steal a user’s financial and personal information, such as credit card numbers and passwords by pretending to be a trustworthy participant, during online communication. Attackers ... Phishing is the act of attempting to steal a user’s financial and personal information, such as credit card numbers and passwords by pretending to be a trustworthy participant, during online communication. Attackers may direct the users to a fake website that could seem legitimate, and then gather useful and confidential information using that site. In order to protect users from Social Engineering techniques such as phishing, various measures have been developed, including improvement of Technical Security. In this paper, we propose a new technique, namely, “A Prediction Model for the Detection of Phishing e-mails using Topic Modelling, Named Entity Recognition and Image Processing”. The features extracted are Topic Modelling features, Named Entity features and Structural features. A multi-classifier prediction model is used to detect the phishing mails. Experimental results show that the multi-classification technique outperforms the single-classifier-based prediction techniques. The resultant accuracy of the detection of phishing e-mail is 99% with the highest False Positive Rate being 2.1%. 展开更多
关键词 PHISHING Conditional Random Field Classifier latent dirichlet allocation Natural Language Processing Machine Learning Image Segmentation Image Processing
下载PDF
A bibliometric analysis of worldwide cancer research using machine learning methods
13
作者 Lianghong Lin Likeng Liang +4 位作者 Maojie Wang Runyue Huang Mengchun Gong Guangjun Song Tianyong Hao 《Cancer Innovation》 2023年第3期219-232,共14页
With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research ... With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research topics,topic evolutions,research collaborations,and potential directions of this research field,this study conducts a bibliometric analysis on 6206 research articles worldwide collected from PubMed between 2011 and 2021 concerning cancer research using machine learning methods.Python is used as a tool for bibliometric analysis,Gephi is used for social network analysis,and the Latent Dirichlet Allocation model is used for topic modeling.The trend analysis of articles not only reflects the innovative research at the intersection of machine learning and cancer but also demonstrates its vigorous development and increasing impacts.In terms of journals,Nature Communications is the most influential journal and Scientific Reports is the most prolific one.The United States and Harvard University have contributed the most to cancer research using machine learning methods.As for the research topic,“Support Vector Machine,”“classification,”and“deep learning”have been the core focuses of the research field.Findings are helpful for scholars and related practitioners to better understand the development status and trends of cancer research using machine learning methods,as well as to have a deeper understanding of research hotspots. 展开更多
关键词 bibliometric analysis CANCER latent dirichlet allocation machine learning research topic topic evolution
原文传递
国家自然科学基金学科规划对学科发展的影响:基于文献计量的分析与思考 被引量:7
14
作者 陈思华 邱焓 霍红 《科学通报》 EI CAS CSCD 北大核心 2022年第7期630-639,共10页
自20世纪50年代中国制定第一个科技发展规划,即《1956~1967年全国科学技术发展远景规划》以来,中国一直使用“规划模式”来推动科学技术的发展[1].国家自然科学基金委员会(以下简称基金委)是我国科学研究的重点资助单位.在中国的科技资... 自20世纪50年代中国制定第一个科技发展规划,即《1956~1967年全国科学技术发展远景规划》以来,中国一直使用“规划模式”来推动科学技术的发展[1].国家自然科学基金委员会(以下简称基金委)是我国科学研究的重点资助单位.在中国的科技资助体系中,基金委有其独特的资助格局和规划路径,重点支持基础研究,同时又面向国家重大需求.为了使学科发展能更及时地适应时代的发展与变化、并更精准地契合国家重大需求。 展开更多
关键词 National Natural Science Foundation of China discipline plan text analysis latent dirichlet allocation fitting rate research hot spots
原文传递
地名视角下青海藏族人类活动演变时空分析 被引量:1
15
作者 栾桂泽 彭直琰 +4 位作者 蔡敬芝 富瑶 宋璐 沈克强 赵飞 《测绘地理信息》 CSCD 2021年第5期163-168,共6页
以青海省95 000余条地名数据为研究样本,运用向前逐步回归分析与隐含狄利克雷分布(latent Dirichlet allocation,LDA)模型等方法,系统地分析了青海藏族地名特征及人类活动时空演变规律。结果表明:(1)藏族聚集区的形成主要受地形与水系... 以青海省95 000余条地名数据为研究样本,运用向前逐步回归分析与隐含狄利克雷分布(latent Dirichlet allocation,LDA)模型等方法,系统地分析了青海藏族地名特征及人类活动时空演变规律。结果表明:(1)藏族聚集区的形成主要受地形与水系的影响,其中地形因素影响最大;(2)1949年后,受民族政策影响出现大量民族区域自治组织;(3)许多牧民转为定居劳作形成现在的藏族聚集区;(4)近300年中,青海藏民的主要迁徙地区为海南藏族自治州及其周边,藏民的迁徙在一定程度上传播了藏传佛教。 展开更多
关键词 藏族聚集区 地名研究 人类活动 逐步回归分析 隐含狄利克雷分布(latent dirichlet allocation LDA)模型 迁移分析
原文传递
Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey 被引量:1
16
作者 Volodymyr V.Mihunov Navid H.Jafari +2 位作者 Kejin Wang Nina S.N.Lam Dylan Govender 《International Journal of Disaster Risk Science》 SCIE CSCD 2022年第5期729-742,共14页
Twitter can supply useful information on infrastructure impacts to the emergency managers during major disasters,but it is time consuming to filter through many irrelevant tweets.Previous studies have identified the t... Twitter can supply useful information on infrastructure impacts to the emergency managers during major disasters,but it is time consuming to filter through many irrelevant tweets.Previous studies have identified the types of messages that can be found on social media during disasters,but few solutions have been proposed to efficiently extract useful ones.We present a framework that can be applied in a timely manner to provide disaster impact information sourced from social media.The framework is tested on a well-studied and data-rich case of Hurricane Harvey.The procedures consist of filtering the raw Twitter data based on keywords,location,and tweet attributes,and then applying the latent Dirichlet allocation(LDA) to separate the tweets from the disaster affected area into categories(topics) useful to emergency managers.The LDA revealed that out of 24 topics found in the data,nine were directly related to disaster impacts-for example,outages,closures,flooded roads,and damaged infrastructure.Features such as frequent hashtags,mentions,URLs,and useful images were then extracted and analyzed.The relevant tweets,along with useful images,were correlated at the county level with flood depth,distributed disaster aid(damage),and population density.Significant correlations were found between the nine relevant topics and population density but not flood depth and damage,suggesting that more research into the suitability of social media data for disaster impacts modeling is needed.The results from this study provide baseline information for such efforts in the future. 展开更多
关键词 Disaster impacts Hurricane Harvey Infrastructure impacts latent dirichlet allocation(LDA) Social media analysis Twitter data
原文传递
Supervised topic models with weighted words:multi-label document classification 被引量:1
17
作者 Yue-peng ZOU Ji-hong OUYANG Xi-ming LI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2018年第4期513-523,共11页
Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks.Representative models include labeled latent Dirichlet allocation(L-LDA)and dependency-LDA.However,these... Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks.Representative models include labeled latent Dirichlet allocation(L-LDA)and dependency-LDA.However,these models neglect the class frequency information of words(i.e.,the number of classes where a word has occurred in the training data),which is significant for classification.To address this,we propose a method,namely the class frequency weight(CF-weight),to weight words by considering the class frequency knowledge.This CF-weight is based on the intuition that a word with higher(lower)class frequency will be less(more)discriminative.In this study,the CF-weight is used to improve L-LDA and dependency-LDA.A number of experiments have been conducted on real-world multi-label datasets.Experimental results demonstrate that CF-weight based algorithms are competitive with the existing supervised topic models. 展开更多
关键词 Supervised topic model Multi-label classification Class frequency Labeled latent dirichlet allocation (L-LDA) Dependency-LDA
原文传递
Topic-Feature Lattices Construction and Visualization for Dynamic Topic Number
18
作者 Kai WANG Fuzhi WANG 《Journal of Systems Science and Information》 CSCD 2021年第5期558-574,共17页
The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and ... The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and tracking of convection text data.However,the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge.Therefore,it is impossible to deeply perceive the dynamic change of the topic in the time series.By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model,a topic feature lattice under the dynamic topic number is constructed.In the model,documents,topics and vocabularies are jointly modeled to generate two probability distribution matrices:Documentstopics and topic-feature words.Afterwards,the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix.Finally,the topic feature is induced according to the formal concept analysis(FCA)theory.The topic feature lattice under dynamic topic number(TFL DTN)model is validated on the real dataset by comparing with the mainstream methods.Experiments show that this model is more in line with actual needs,and achieves better results in semi-automatic modeling of topic visualization analysis. 展开更多
关键词 dynamic topic number infinite latent dirichlet allocation(ILDA) formal concept analysis topic feature lattice topic feature lattice under dynamic topic number(TFL_DTN)model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部