期刊文献+
共找到29篇文章
< 1 2 >
每页显示 20 50 100
Unsupervised Feature Selection for Latent Dirichlet Allocation 被引量:1
1
作者 徐蔚然 杜刚 +2 位作者 陈光 郭军 杨洁 《China Communications》 SCIE CSCD 2011年第5期54-62,共9页
As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability throug... As a generative model,Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between "general word" and "special word" in LDA topics.Therefore,we add a constraint to the LDA objective function to let the "general words" only happen in "general topics" other than "special topics".Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human. 展开更多
关键词 pattern recognition unsupervised feature selection latent dirichlet allocation general topic special topic
下载PDF
Fuzzy Based Latent Dirichlet Allocation for Intrusion Detection in Cloud Using ML
2
作者 S.Ranjithkumar S.Chenthur Pandian 《Computers, Materials & Continua》 SCIE EI 2022年第3期4261-4277,共17页
The growth of cloud in modern technology is drastic by provisioning services to various industries where data security is considered to be common issue that influences the intrusion detection system(IDS).IDS are consi... The growth of cloud in modern technology is drastic by provisioning services to various industries where data security is considered to be common issue that influences the intrusion detection system(IDS).IDS are considered as an essential factor to fulfill security requirements.Recently,there are diverse Machine Learning(ML)approaches that are used for modeling effectual IDS.Most IDS are based on ML techniques and categorized as supervised and unsupervised.However,IDS with supervised learning is based on labeled data.This is considered as a common drawback and it fails to identify the attack patterns.Similarly,unsupervised learning fails to provide satisfactory outcomes.Therefore,this work concentrates on semi-supervised learning model known as Fuzzy based semi-supervised approach through Latent Dirichlet Allocation(F-LDA)for intrusion detection in cloud system.This helps to resolve the aforementioned challenges.Initially,LDA gives better generalization ability for training the labeled data.Similarly,to handle the unlabelled data,Fuzzy model has been adopted for analyzing the dataset.Here,preprocessing has been carried out to eliminate data redundancy over network dataset.In order to validate the efficiency of F-LDA towards ID,this model is tested under NSL-KDD cup dataset is a common traffic dataset.Simulation is done inMATLAB environment and gives better accuracy while comparing with benchmark standard dataset.The proposed F-LDAgives better accuracy and promising outcomes than the prevailing approaches. 展开更多
关键词 Cloud security fuzzy model latent dirichlet allocation PREPROCESSING NSL-KDD
下载PDF
PG-CODE:Latent Dirichlet Allocation Embedded Policy Knowledge Graph for Government Department Coordination 被引量:3
3
作者 Yilin Kang Renwei Ou +2 位作者 Yi Zhang Hongling Li Shasha Tian 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第4期680-691,共12页
Government policy-group integration and policy-chain inference are significant to the execution of strategies in current Chinese society.Specifically,the coordination of hierarchical policies implemented among governm... Government policy-group integration and policy-chain inference are significant to the execution of strategies in current Chinese society.Specifically,the coordination of hierarchical policies implemented among government departments is one of the key challenges to rural revitalization.In recent years,various well-established quantitative methods have been proposed to evaluate policy coordination,but the majority of these relied on manual analysis,which can lead to subjective results.Thus,in this paper,a novel approach called“policy knowledge graph for the coordination among the government departments”(PG-CODE)is proposed,which incorporates topic modeling into policy knowledge graphs.Similar to a knowledge graph,a policy knowledge graph uses a graph-structured data model to integrate policy discourse.With latent Dirichlet allocation embedding,a policy knowledge graph could capture the underlying topics of the policies.Furthermore,coordination strength and topic diffusion among hierarchical departments could be inferred from the PG-CODE,as it can provide a better representation of coordination within the policy space.We implemented and evaluated the PG-CODE in the field of rural innovation and entrepreneurship policy,and the results effectively demonstrate improved coordination among departments. 展开更多
关键词 policy knowledge graph department coordination topic diffusion latent dirichlet allocation rural revitalization
原文传递
Online Latent Dirichlet Allocation Model Based on Sentiment Polarity Time Series
4
作者 HUANG Bo JU Jiaji +3 位作者 CHEN Huan ZHU Yimin LIU Jin SHI Zhicai 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2021年第6期464-472,共9页
The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent... The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent Dirichlet Allocation(LDA)to obtain the distribution of topic words in the current time window.Second,the word2 vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic.Finally,the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors,and the distribution of topic words in the next time window is controlled through them.The experimental results show that the PSOLDA model decreases the probability distribution by 0.1601,while Online Twitter LDA only increases by 0.0699.The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model. 展开更多
关键词 topic evolution sentiment factors word vector latent dirichlet allocation(LDA)
原文传递
Improving Parameter Estimation and Defensive Ability of Latent Dirichlet Allocation Model Training Under Rényi Differential Privacy
5
作者 Tao Huang Su-Yun Zhao +1 位作者 Hong Chen Yi-Xuan Liu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1382-1397,共16页
Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the... Latent Dirichlet allocation(LDA)is a topic model widely used for discovering hidden semantics in massive text corpora.Collapsed Gibbs sampling(CGS),as a widely-used algorithm for learning the parameters of LDA,has the risk of privacy leakage.Specifically,word count statistics and updates of latent topics in CGS,which are essential for parameter estimation,could be employed by adversaries to conduct effective membership inference attacks(MIAs).Till now,there are two kinds of methods exploited in CGS to defend against MIAs:adding noise to word count statistics and utilizing inherent privacy.These two kinds of methods have their respective limitations.Noise sampled from the Laplacian distribution sometimes produces negative word count statistics,which render terrible parameter estimation in CGS.Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs.It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy.The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived.It is the first time that R′enyi differential privacy(RDP)has been introduced into CGS and we propose RDP-LDA,an effective framework for analyzing the privacy loss of any differentially private CGS.RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained byε-DP.In RDP-LDA,we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative.And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy.Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs. 展开更多
关键词 latent dirichlet allocation parameter estimation membership inference attack Rényi differential privacy
原文传递
Exploit latent Dirichlet allocation for collaborative filtering
6
作者 Zhoujun LI Haijun ZHANG +3 位作者 Senzhang WANG Feiran HUANG Zhenping LI Jianshe ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第3期571-581,共11页
Previous work on the one-class collaborative filtering (OCCF) problem can be roughly categorized into pointwise methods, pairwise methods, and content-based methods. A fundamental assumption of these approaches is t... Previous work on the one-class collaborative filtering (OCCF) problem can be roughly categorized into pointwise methods, pairwise methods, and content-based methods. A fundamental assumption of these approaches is that all missing values in the user-item rating matrix are considered negative. However, this assumption may not hold because the missing values may contain negative and positive examples. For example, a user who fails to give positive feedback about an item may not necessarily dislike it; he may simply be unfamiliar with it. Meanwhile, content-based methods, e.g. collaborative topic regression (CTR), usually require textual content information of the items, and thus their applicability is largely limited when the text information is not available. In this paper, we propose to apply the latent Dirichlet allocation (LDA) model on OCCF to address the above-mentioned problems. The basic idea of this approach is that items are regarded as words, users are considered as documents, and the user-item feedback matrix constitutes the corpus. Our model drops the strong assumption that missing values are all negative and only utilizes the observed data to predict a user's interest. Additionally, the proposed model does not need content information of the items. Experimental results indicate that the proposed method outperforms previous methods on various ranking-oriented evaluation metrics.We further combine this method with a matrix factorizationbased method to tackle the multi-class collaborative filtering (MCCF) problem, which also achieves better performance on predicting user ratings. 展开更多
关键词 latent dirichlet allocation one-class collaborative filtering multi-class collaborative filtering
原文传递
Enhanced Topic-Aware Summarization Using Statistical Graph Neural Networks
7
作者 Ayesha Khaliq Salman Afsar Awan +2 位作者 Fahad Ahmad Muhammad Azam Zia Muhammad Zafar Iqbal 《Computers, Materials & Continua》 SCIE EI 2024年第8期3221-3242,共22页
The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Curr... The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Current approaches in Extractive Text Summarization(ETS)leverage the modeling of inter-sentence relationships,a task of paramount importance in producing coherent summaries.This study introduces an innovative model that integrates Graph Attention Networks(GATs)with Transformer-based Bidirectional Encoder Representa-tions from Transformers(BERT)and Latent Dirichlet Allocation(LDA),further enhanced by Term Frequency-Inverse Document Frequency(TF-IDF)values,to improve sentence selection by capturing comprehensive topical information.Our approach constructs a graph with nodes representing sentences,words,and topics,thereby elevating the interconnectivity and enabling a more refined understanding of text structures.This model is stretched to Multi-Document Summarization(MDS)from Single-Document Summarization,offering significant improvements over existing models such as THGS-GMM and Topic-GraphSum,as demonstrated by empirical evaluations on benchmark news datasets like Cable News Network(CNN)/Daily Mail(DM)and Multi-News.The results consistently demonstrate superior performance,showcasing the model’s robustness in handling complex summarization tasks across single and multi-document contexts.This research not only advances the integration of BERT and LDA within a GATs but also emphasizes our model’s capacity to effectively manage global information and adapt to diverse summarization challenges. 展开更多
关键词 SUMMARIZATION graph attention network bidirectional encoder representations from transformers latent dirichlet allocation term frequency-inverse document frequency
下载PDF
Analysis of Public Sentiment regarding COVID-19 Vaccines on the Social Media Platform Reddit
8
作者 Lucien Dikla Ngueleo Jules Pagna Disso +2 位作者 Armel Ayimdji Tekemetieu Justin Moskolaï Ngossaha Michael Nana Kameni 《Journal of Computer and Communications》 2024年第2期80-108,共29页
This study undertakes a thorough analysis of the sentiment within the r/Corona-virus subreddit community regarding COVID-19 vaccines on Reddit. We meticulously collected and processed 34,768 comments, spanning from No... This study undertakes a thorough analysis of the sentiment within the r/Corona-virus subreddit community regarding COVID-19 vaccines on Reddit. We meticulously collected and processed 34,768 comments, spanning from November 20, 2020, to January 17, 2021, using sentiment calculation methods such as TextBlob and Twitter-RoBERTa-Base-sentiment to categorize comments into positive, negative, or neutral sentiments. The methodology involved the use of Count Vectorizer as a vectorization technique and the implementation of advanced ensemble algorithms like XGBoost and Random Forest, achieving an accuracy of approximately 80%. Furthermore, through the Dirichlet latent allocation, we identified 23 distinct reasons for vaccine distrust among negative comments. These findings are crucial for understanding the community’s attitudes towards vaccination and can guide targeted public health messaging. Our study not only provides insights into public opinion during a critical health crisis, but also demonstrates the effectiveness of combining natural language processing tools and ensemble algorithms in sentiment analysis. 展开更多
关键词 COVID-19 Vaccine TextBlob Twitter-RoBERTa-Base-Sentiment Sentiment Analysis latent dirichlet allocation
下载PDF
基于LDA的地铁施工安全隐患排查要点挖掘与可视化研究 被引量:3
9
作者 潘杏 钟波涛 +1 位作者 黑永健 骆汉宾 《土木建筑工程信息技术》 2021年第2期7-14,共8页
随着地铁的快速建设和隐患排查系统的建立,系统中积累了大量隐患排查记录,但是隐患排查记录信息冗杂,相关工作严重依赖导则与专家经验,需要投入大量人力成本。为提高隐患排查工作效率和安全管理决策,同时促进排查工作实现全程自动化,本... 随着地铁的快速建设和隐患排查系统的建立,系统中积累了大量隐患排查记录,但是隐患排查记录信息冗杂,相关工作严重依赖导则与专家经验,需要投入大量人力成本。为提高隐患排查工作效率和安全管理决策,同时促进排查工作实现全程自动化,本文提出了一种基于文本挖掘与可视化技术的自动化分析隐患排查文本框架,该框架主要包括以下四个步骤:第一,基于Term Frequency-Inverse Document Frequency(TF-IDF)算法,对隐患描述下的关键词有一个整体的概括;第二,基于TF-IDF筛出特征值较高的关键词,借助吉布斯抽样的Latent Dirichlet Allocation(LDA)模型识别出大规模隐患描述语料库中潜藏的主题信息和隐患排查要点;第三,结合时间维度,通过Word Cloud(WC)技术对隐患描述进行可视化分析,绘制隐患词云演化图;第四,借助Word Co-occurrence Network(WCN)模型,挖掘隐患共现关系。该框架在分析武汉地铁2016-2018年施工安全隐患排查记录中得到了应用和验证。实验结果表明,该框架有效挖掘出34类隐患所对应的隐患排查要点和可视化信息。 展开更多
关键词 安全管理 地铁施工安全 隐患排查要点 文本挖掘 latent dirichlet allocation模型 数据可视化
下载PDF
News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark 被引量:17
10
作者 Zhuo Zhou Jiaohua Qin +3 位作者 Xuyu Xiang Yun Tan Qiang Liu Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2020年第1期217-231,共15页
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm... Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity. 展开更多
关键词 News text topic clustering spark platform countvectorizer algorithm TF-IDF algorithm latent dirichlet allocation model
下载PDF
基于LDA模型的电力投诉文本热点话题识别 被引量:3
11
作者 许睿 龙丹 +1 位作者 刘佳 刘畅 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2020年第S02期26-31,共6页
电力客户投诉是评价电网公司客户满意度的核心指标.传统人工分析方法存在发现热点话题上存在低效率、实时性不高等问题,提出一种基于LDA(Latent Dirichlet Allocation)模型的电力投诉文本热点话题的识别方法.首先,使用TF-IDF方法从电力... 电力客户投诉是评价电网公司客户满意度的核心指标.传统人工分析方法存在发现热点话题上存在低效率、实时性不高等问题,提出一种基于LDA(Latent Dirichlet Allocation)模型的电力投诉文本热点话题的识别方法.首先,使用TF-IDF方法从电力投诉文本中提取TOP-N关键词,作为该文本的特征词集合,并采用词袋模型,将文本向量表示;其次,使用LDA模型提取文本的话题,得到"文本-话题"矩阵和"话题-单词"矩阵;然后,根据关键词在话题-单词矩阵中出现的分布概率以及关键词在文本中出现的频率,选取从最大权重值的关键词,作为该话题的特征词;最后,使用文档话题支持度,从提取的话题中识别出热点话题.实验结果表明该方法可以准确识别电力投诉文本中的热点话题. 展开更多
关键词 话题识别 LDA(latent dirichlet allocation)模型 TF-IDF 电力投诉文本
下载PDF
基于微生物分类的信息推荐模型
12
作者 李威耀 范国梅 马俊才 《计算机应用研究》 CSCD 北大核心 2020年第S01期211-212,210,共3页
信息技术的快速发展使得互联网上每天都会产生亿万条信息,面对如此海量的信息,单纯地依靠人力去筛选有效信息已经不现实,于是,能够快速筛选有效信息的推荐算法应运而生并得到了快速发展。为了解决实际工作中快速对获取的微生物信息进行... 信息技术的快速发展使得互联网上每天都会产生亿万条信息,面对如此海量的信息,单纯地依靠人力去筛选有效信息已经不现实,于是,能够快速筛选有效信息的推荐算法应运而生并得到了快速发展。为了解决实际工作中快速对获取的微生物信息进行分类的需求,使用自然语言处理技术,通过用户访问日志获取历史信息,使用隐含狄利克雷分布(LDA)算法从被访问信息中抽取主题模型,对所获得的微生物信息进行分类,然后使用协同推荐算法,建立基于微生物分类的信息推荐模型,通过这个模型,可以快速将信息分类。实验结果具有较高的准确率,能够为用户提供个性化服务。 展开更多
关键词 推荐算法 访问日志 latent dirichlet allocation(LDA) 主题模型 协同推荐
下载PDF
Behavior Clustering for Anomaly Detection 被引量:1
13
作者 Zhu Xudong Li Hui Liu Zhijing 《China Communications》 SCIE CSCD 2010年第6期17-23,共7页
We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language pr... We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language processing, we introduced a compact and effective behavior representation method as a stochastic sequence of spatiotemporal events, where we analyzed the global structural information of behaviors using their local action statistics. 2) The natural grouping of behavior patterns was discovered through a novel clustering algorithm. 3 ) A run-time accumulative anomaly measure was introduced to detect abnormal behavior, whereas normal behavior patterns were recognized when sufficient visual evidence had become available based on an online Likelihood Ratio Test (LRT) method. This ensured robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. Experimental results demonstrated the effectiveness and robustness of our approach using noisy and sparse data sets collected from a real surveillance scenario. 展开更多
关键词 computer vision anomaly detection Hidden Markov Model latent dirichlet allocation
下载PDF
Self-Adaptive Topic Model: A Solution to the Problem of "Rich Topics Get Richer" 被引量:1
14
作者 FANG Ying 《China Communications》 SCIE CSCD 2014年第12期35-43,共9页
The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet... The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible. 展开更多
关键词 topic model infinite latent dirichlet allocation dirichlet process topic evolution
下载PDF
A Multi-Classifier Based Prediction Model for Phishing Emails Detection Using Topic Modelling, Named Entity Recognition and Image Processing
15
作者 C. Emilin Shyni S. Sarju S. Swamynathan 《Circuits and Systems》 2016年第9期2507-2520,共14页
Phishing is the act of attempting to steal a user’s financial and personal information, such as credit card numbers and passwords by pretending to be a trustworthy participant, during online communication. Attackers ... Phishing is the act of attempting to steal a user’s financial and personal information, such as credit card numbers and passwords by pretending to be a trustworthy participant, during online communication. Attackers may direct the users to a fake website that could seem legitimate, and then gather useful and confidential information using that site. In order to protect users from Social Engineering techniques such as phishing, various measures have been developed, including improvement of Technical Security. In this paper, we propose a new technique, namely, “A Prediction Model for the Detection of Phishing e-mails using Topic Modelling, Named Entity Recognition and Image Processing”. The features extracted are Topic Modelling features, Named Entity features and Structural features. A multi-classifier prediction model is used to detect the phishing mails. Experimental results show that the multi-classification technique outperforms the single-classifier-based prediction techniques. The resultant accuracy of the detection of phishing e-mail is 99% with the highest False Positive Rate being 2.1%. 展开更多
关键词 PHISHING Conditional Random Field Classifier latent dirichlet allocation Natural Language Processing Machine Learning Image Segmentation Image Processing
下载PDF
Deep Learning and Network Analysis:Classifying and Visualizing Geologic Hazard Reports
16
作者 Wenjia Li Liang Wu +5 位作者 Xinde Xu Zhong Xie Qinjun Qiu Hao Liu Zhen Huang Jianguo Chen 《Journal of Earth Science》 SCIE CAS CSCD 2024年第4期1289-1303,共15页
If progress is to be made toward improving geohazard management and emergency decision-making,then lessons need to be learned from past geohazard information.A geologic hazard report provides a useful and reliable sou... If progress is to be made toward improving geohazard management and emergency decision-making,then lessons need to be learned from past geohazard information.A geologic hazard report provides a useful and reliable source of information about the occurrence of an event,along with detailed information about the condition or factors of the geohazard.Analyzing such reports,however,can be a challenging process because these texts are often presented in unstructured long text formats,and contain rich specialized and detailed information.Automatically text classification is commonly used to mine disaster text data in open domains(e.g.,news and microblogs).But it has limitations to performing contextual long-distance dependencies and is insensitive to discourse order.These deficiencies are most obviously exposed in long text fields.Therefore,this paper uses the bidirectional encoder representations from Transformers(BERT),to model long text.Then,utilizing a softmax layer to automatically extract text features and classify geohazards without manual features.The latent Dirichlet allocation(LDA)model is used to examine the interdependencies that exist between causal variables to visualize geohazards.The proposed method is useful in enabling the machine-assisted interpretation of text-based geohazards.Moreover,it can help users visualize causes,processes,and other geohazards and assist decision-makers in emergency responses. 展开更多
关键词 geologic hazard network analysis latent dirichlet allocation text classification deep learning
原文传递
A bibliometric analysis of worldwide cancer research using machine learning methods
17
作者 Lianghong Lin Likeng Liang +4 位作者 Maojie Wang Runyue Huang Mengchun Gong Guangjun Song Tianyong Hao 《Cancer Innovation》 2023年第3期219-232,共14页
With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research ... With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research topics,topic evolutions,research collaborations,and potential directions of this research field,this study conducts a bibliometric analysis on 6206 research articles worldwide collected from PubMed between 2011 and 2021 concerning cancer research using machine learning methods.Python is used as a tool for bibliometric analysis,Gephi is used for social network analysis,and the Latent Dirichlet Allocation model is used for topic modeling.The trend analysis of articles not only reflects the innovative research at the intersection of machine learning and cancer but also demonstrates its vigorous development and increasing impacts.In terms of journals,Nature Communications is the most influential journal and Scientific Reports is the most prolific one.The United States and Harvard University have contributed the most to cancer research using machine learning methods.As for the research topic,“Support Vector Machine,”“classification,”and“deep learning”have been the core focuses of the research field.Findings are helpful for scholars and related practitioners to better understand the development status and trends of cancer research using machine learning methods,as well as to have a deeper understanding of research hotspots. 展开更多
关键词 bibliometric analysis CANCER latent dirichlet allocation machine learning research topic topic evolution
原文传递
国家自然科学基金学科规划对学科发展的影响:基于文献计量的分析与思考 被引量:7
18
作者 陈思华 邱焓 霍红 《科学通报》 EI CAS CSCD 北大核心 2022年第7期630-639,共10页
自20世纪50年代中国制定第一个科技发展规划,即《1956~1967年全国科学技术发展远景规划》以来,中国一直使用“规划模式”来推动科学技术的发展[1].国家自然科学基金委员会(以下简称基金委)是我国科学研究的重点资助单位.在中国的科技资... 自20世纪50年代中国制定第一个科技发展规划,即《1956~1967年全国科学技术发展远景规划》以来,中国一直使用“规划模式”来推动科学技术的发展[1].国家自然科学基金委员会(以下简称基金委)是我国科学研究的重点资助单位.在中国的科技资助体系中,基金委有其独特的资助格局和规划路径,重点支持基础研究,同时又面向国家重大需求.为了使学科发展能更及时地适应时代的发展与变化、并更精准地契合国家重大需求。 展开更多
关键词 National Natural Science Foundation of China discipline plan text analysis latent dirichlet allocation fitting rate research hot spots
原文传递
地名视角下青海藏族人类活动演变时空分析 被引量:1
19
作者 栾桂泽 彭直琰 +4 位作者 蔡敬芝 富瑶 宋璐 沈克强 赵飞 《测绘地理信息》 CSCD 2021年第5期163-168,共6页
以青海省95 000余条地名数据为研究样本,运用向前逐步回归分析与隐含狄利克雷分布(latent Dirichlet allocation,LDA)模型等方法,系统地分析了青海藏族地名特征及人类活动时空演变规律。结果表明:(1)藏族聚集区的形成主要受地形与水系... 以青海省95 000余条地名数据为研究样本,运用向前逐步回归分析与隐含狄利克雷分布(latent Dirichlet allocation,LDA)模型等方法,系统地分析了青海藏族地名特征及人类活动时空演变规律。结果表明:(1)藏族聚集区的形成主要受地形与水系的影响,其中地形因素影响最大;(2)1949年后,受民族政策影响出现大量民族区域自治组织;(3)许多牧民转为定居劳作形成现在的藏族聚集区;(4)近300年中,青海藏民的主要迁徙地区为海南藏族自治州及其周边,藏民的迁徙在一定程度上传播了藏传佛教。 展开更多
关键词 藏族聚集区 地名研究 人类活动 逐步回归分析 隐含狄利克雷分布(latent dirichlet allocation LDA)模型 迁移分析
原文传递
Topic Model for Chinese Medicine Diagnosis and Prescription Regularities Analysis:Case on Diabetes 被引量:7
20
作者 张小平 周雪忠 +3 位作者 黄厚宽 冯奇 陈世波 刘保延 《Chinese Journal of Integrative Medicine》 SCIE CAS 2011年第4期307-313,共7页
Induction of common knowledge or regularities from large-scale clinical data is a vital task for Chinese medicine(CM).In this paper,we propose a data mining method,called the Symptom-Herb-Diagnosis topic(SHDT) mod... Induction of common knowledge or regularities from large-scale clinical data is a vital task for Chinese medicine(CM).In this paper,we propose a data mining method,called the Symptom-Herb-Diagnosis topic(SHDT) model,to automatically extract the common relationships among symptoms,herb combinations and diagnoses from large-scale CM clinical data.The SHDT model is one of the multi-relational extensions of the latent topic model,which can acquire topic structure from discrete corpora(such as document collection) by capturing the semantic relations among words.We applied the SHDT model to discover the common CM diagnosis and treatment knowledge for type 2 diabetes mellitus(T2DM) using 3 238 inpatient cases.We obtained meaningful diagnosis and treatment topics(clusters) from the data,which clinically indicated some important medical groups corresponding to comorbidity diseases(e.g.,heart disease and diabetic kidney diseases in T2DM inpatients).The results show that manifestation sub-categories actually exist in T2DM patients that need specific,individualised CM therapies.Furthermore,the results demonstrate that this method is helpful for generating CM clinical guidelines for T2DM based on structured collected clinical data. 展开更多
关键词 latent dirichlet allocation Author-Topic model dirichlet priori Chinese medicine Symptom-Herb-Diagnosis topic model
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部