期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
Fuzzy c-means text clustering based on topic concept sub-space 被引量:3
1
作者 吉翔华 陈超 +1 位作者 邵正荣 俞能海 《Journal of Southeast University(English Edition)》 EI CAS 2007年第3期439-442,共4页
To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con... To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision. 展开更多
关键词 TCS2FCM topic concept space fuzzy c-means clustering text clustering
下载PDF
Ontology-based similarity measure for text clustering 被引量:1
2
作者 颜端武 李晓鹏 +1 位作者 王磊 成晓 《Journal of Southeast University(English Edition)》 EI CAS 2006年第3期389-393,共5页
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor... A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described. 展开更多
关键词 similarity measure text clustering ONTOLOGY information retrieval system
下载PDF
Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering 被引量:4
3
作者 SU Ya-ru WANG Ru-jing +3 位作者 CHEN Peng WEI Yuan-yuan LI Chuan-xi HU Yi-min 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2012年第5期752-759,共8页
Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: thecurse of dimension... Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: thecurse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization. 展开更多
关键词 agricultural ontology feature optimization agricultural text clustering
下载PDF
A New Feature Selection Method for Text Clustering 被引量:3
4
作者 XU Junling XU Baowen +2 位作者 ZHANG Weifeng CUI Zifeng ZHANG Wei 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期912-916,共5页
Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method... Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method. 展开更多
关键词 feature selection text clustering unsupervised learning data preprocessing
下载PDF
An Incremental Algorithm of Text Clustering Based on Semantic Sequences 被引量:1
5
作者 FENG Zhonghui SHEN Junyi BAO Junpeng 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1340-1344,共5页
This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate clu... This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm. The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set. 展开更多
关键词 text clustering semantic sequence ENTROPY
下载PDF
Similarity matrix-based K-means algorithm for text clustering
6
作者 曹奇敏 郭巧 吴向华 《Journal of Beijing Institute of Technology》 EI CAS 2015年第4期566-572,共7页
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo... K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable. 展开更多
关键词 text clustering K-means algorithm similarity matrix F-MEASURE
下载PDF
The Refinement Algorithm Consideration in Text Clustering Scheme Based on Multilevel Graph
7
作者 CHENJian-bin DONGXiang-jun SONGHan-tao 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期671-675,共5页
To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of ... To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining. 展开更多
关键词 text clustering multilevel coarsen graph model refinement algorithm high-dimensional clustering
下载PDF
FICW: Frequent Itemset Based Text Clustering with Window Constraint
8
作者 ZHOU Chong LU Yansheng ZOU Lei HU Rong 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1345-1351,共7页
Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the s... Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. 展开更多
关键词 text clustering frequent itemsets search engine
下载PDF
Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data
9
作者 Yu Jiang Dengwen Yu +3 位作者 Mingzhao Zhao Hongtao Bai Chong Wang Lili He 《Computers, Materials & Continua》 SCIE EI 2020年第7期207-216,共10页
Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning.This paper implements and compares unsupervised and semi-supervise... Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning.This paper implements and compares unsupervised and semi-supervised clustering analysis of BOA-Argo ocean text data.Unsupervised K-Means and Affinity Propagation(AP)are two classical clustering algorithms.The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range.Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition,and use this data for semi-supervised cluster analysis.Several semi-supervised clustering algorithms were chosen for comparison of learning performance:Constrained-K-Means,Seeded-K-Means,SAP(Semi-supervised Affinity Propagation),LSAP(Loose Seed AP)and CSAP(Compact Seed AP).In order to adapt the single label,this paper improves the above algorithms to SCKM(improved Constrained-K-Means),SSKM(improved Seeded-K-Means),and SSAP(improved Semi-supervised Affinity Propagationg)to perform semi-supervised clustering analysis on the data.A DSAP(Double Seed AP)semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect.The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data. 展开更多
关键词 Unsupervised learning semi-supervised learning text clustering
下载PDF
Using ontology semantics to improve text documents clustering 被引量:8
10
作者 罗娜 左万利 +2 位作者 袁福宇 张靖波 张慧杰 《Journal of Southeast University(English Edition)》 EI CAS 2006年第3期370-374,共5页
In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing ... In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc. 展开更多
关键词 ONTOLOGY text clustering LEXICON WORDNET
下载PDF
A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE 被引量:2
11
作者 Haoxiang XIA Shuguang WANG Taketoshi YOSHIDA 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2006年第4期474-492,共19页
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontol... Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm. 展开更多
关键词 Ant-based clustering text clustering ant movement rule semantic similarity measure
原文传递
Text clustering based on fusion of ant colony and genetic algorithms
12
作者 Yun ZHANG Boqin FENG +1 位作者 Shouqiang MA Lianmeng LIU 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2009年第1期15-19,共5页
Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space,a text clustering approach based on the fusion of the ant colony and genetic algorithms is pr... Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space,a text clustering approach based on the fusion of the ant colony and genetic algorithms is proposed.The four parameters that influence the performance of the ant colony algorithm are encoded as chromosomes,thereby the fitness function,selection,crossover and mutation operator are designed to find the combination of optimal parameters through a number of iteration,and then it is applied to text clustering.The simulation results show that compared with the classical k-means clustering and the basic ant colony clustering algorithm,the proposed algorithm has better performance and the value of F-Measure is enhanced by 5.69%,48.60%and 69.60%,respectively,in 3 test datasets.Therefore,it is more suitable for processing a larger dataset. 展开更多
关键词 ant colony clustering genetic algorithm FUSION text clustering
原文传递
The research and realization about automatic abstracting based on text clustering and natural language understanding
13
作者 GUO Qing-lin FAN Xiao-zhong LIU Chang-an 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2006年第4期460-464,共5页
A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering ... A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering and can realize automatic abstracting of multi-documents. The algo- rithm of twice word segmentation based on the title and first sentences in paragraphs is investigated. Its precision and recall is above 95 %. For a specific domain on plastics, an automatic abstracting system named TCAAS is implemented. The precision and recall of multi-document’s automatic ab- stracting is above 75 %. Also, the experiments prove that it is feasible to use the method to develop a domain automatic abstracting system, which is valuable for further in-depth study. 展开更多
关键词 automatic abstracting text clustering natural language understanding
原文传递
News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark 被引量:17
14
作者 Zhuo Zhou Jiaohua Qin +3 位作者 Xuyu Xiang Yun Tan Qiang Liu Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2020年第1期217-231,共15页
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm... Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity. 展开更多
关键词 News text topic clustering spark platform countvectorizer algorithm TF-IDF algorithm latent dirichlet allocation model
下载PDF
Genetic-Frog-Leaping Algorithm for Text Document Clustering 被引量:1
15
作者 Lubna Alhenak Manar Hosny 《Computers, Materials & Continua》 SCIE EI 2019年第9期1045-1074,共30页
In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from lar... In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from large collections of data,and particularly documents,has become more necessary and challenging.Text clustering is such a technique;it consists in dividing a set of text documents into clusters(groups),so that documents within the same cluster are closely related,whereas documents in different clusters are as different as possible.Clustering depends on measuring the content(i.e.,words)of a document in terms of relevance.Nevertheless,as documents usually contain a large number of words,some of them may be irrelevant to the topic under consideration or redundant.This can confuse and complicate the clustering process and make it less accurate.Accordingly,feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features.In this study,we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features.The proposed approach is based on two metaheuristic algorithms:a genetic algorithm(GA)and a shuffled frog-leaping algorithm(SFLA).The GA performs feature selection,and the SFLA performs clustering.To evaluate its effectiveness,the proposed approach was tested on a well-known text document dataset:the“20Newsgroup”dataset from the University of California Irvine Machine Learning Repository.Overall,after multiple experiments were compared and analyzed,it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering,compared with classical K-means clustering.Nevertheless,this improvement requires longer computational time. 展开更多
关键词 text documents clustering meta-heuristic algorithms shuffled frog-leaping algorithm genetic algorithm feature selection
下载PDF
A comparative analysis of text representation, classification and clustering methods over real project proposals
16
作者 Meltem Aksoy Seda Yanık Mehmet Fatih Amasyali 《International Journal of Intelligent Computing and Cybernetics》 EI 2023年第3期595-628,共34页
Purpose-When a large number of project proposals are evaluated to alocate available funds,grouping them based on their simiarites is benefciaL.Current approaches to group proposals are primarily based on manual matchi... Purpose-When a large number of project proposals are evaluated to alocate available funds,grouping them based on their simiarites is benefciaL.Current approaches to group proposals are primarily based on manual matching of similar topics,discipline areas and keywordls declared by project applicants.When the number of proposals increases,this task becomes complex and requires excessive time.This paper aims to demonstrate how to ffctively use the rich information in the titles and abstracts of Turkish project propsals to group them atmaially.Design/methodology/approach-This study proposes a model that effectively groups Turkish project proposals by combining word embedding,clustering and classification technigues.The proposed model uses FastText,BERT and term frequency/inverse document frequency(TF/IDF)word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish.The extracted terms were grouped using both the clustering and classification techniques.Natural groups contained within the corpus were discovered using k-means,k-means++,k-medoids and agglomerative clustering algorithms,Additionally,this study employs classification approaches to predict the target class for each document in the corpus.To classify project proposals,var ious classifiers,including k nearest neighbors(KNN),support vector machines(SVM),artificial neural networks(ANN),cassftcation and regression trees(CART)and random forest(RF),are used.Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.Findings-The results show that the generated word embeddings an fftvely represent proposal texts as vectors,and can be used as inputs for dustering or casificatiomn algorithms.Using clustering algorithms,the document corpus is divided into five groups.In adition,the results demonstrate that the proposals can easily be categoried into predefmned categories using cassifiation algorithms.SVM-Linear achieved the highest prediction accuracy(89.2%)with the FastText word embedding.method.A comparison of mamual grouping with automatic casification and clutering results revealed that both classification and custering techniques have a high sucess rate.Research limitations/implications-The propsed mdelatomatically benefits fromthe rich information in project proposals and significantly reduces numerous time consuming tasks that managers must perform manually.Thus,it eliminates the drawbacks of the curent manual methods and yields significantly more acurate results.In the future,additional experiments should be conducted to validate the proposed method using data from other funding organizations.Originality/value-This study presents the application of word embedding methods to eftively use the rich information in the titles and abstracts of Turkish project proposals.Existing research studies focus on the automatice grouping of proposals;traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals.Unlike previous research,this study employs two outperforming neural network-based textual feature extraction techniques to obtain termns representing the proposals:BERT as a contextual word embedding method and F astText as a static word embedding method.Moreover,to the best of our knowledge,there has been no research conducted on the grouping of project proposals in Turkish. 展开更多
关键词 Project proposal selection text mining Word embedding text clustering text classification
原文传递
The Analysis of China’s Integrity Situation Based on Big Data 被引量:1
17
作者 Wangdong Jiang Taian Yang +4 位作者 Guang Sun Yucai Li Yixuan Tang Hongzhang Lv Wenqian Xiang 《Journal on Big Data》 2019年第3期117-134,共18页
In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,... In order to study deeply the prominent problems faced by China’s clean government work,and put forward effective coping strategies,this article analyzes the network information of anti-corruption related news events,which is based on big data technology.In this study,we take the news report from the website of the Communist Party of China(CPC)Central Commission for Discipline Inspection(CCDI)as the source of data.Firstly,the obtained text data is converted to word segmentation and stop words under preprocessing,and then the pre-processed data is improved by vectorization and text clustering,finally,after text clustering,the key words of clean government work is derived from visualization analysis.According to the results of this study,it shows that China’s clean government work should focus on‘the four forms of decadence’issue,and related departments must strictly crack down five categories of phenomena,such as“illegal payment of subsidies or benefits,illegal delivery of gifts and cash gift,illegal use of official vehicles,banquets using public funds,extravagant wedding ceremonies and funeral”.The results of this study are consistent with the official data released by the CCDI’s website,which also suggests that the method is feasible and effective. 展开更多
关键词 Big data ANTI-CORRUPTION text clustering VISUALIZATION
下载PDF
Unsupervised Graph-Based Tibetan Multi-Document Summarization
18
作者 Xiaodong Yan Yiqin Wang +3 位作者 Wei Song Xiaobing Zhao A.Run Yang Yanxing 《Computers, Materials & Continua》 SCIE EI 2022年第10期1769-1781,共13页
Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good res... Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in Tibetan.What’s more,there is no large-scale annotated corpus for text summarization.The lack of dataset severely limits the development of low-resource text summarization.In this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data.In this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic.Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough.In terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering.Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents.Then model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is extracted.In order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments.The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods. 展开更多
关键词 Multi-document summarization text clustering topic feature fusion graphic model
下载PDF
Uncovering Topics of Public Cultural Activities: Evidence from China
19
作者 Zixin Zeng Bolin Hua 《Data Intelligence》 EI 2022年第3期509-528,共20页
In this study, we uncover the topics of Chinese public cultural activities in 2020 with a two-step short text clustering(self-taught neural networks and graph-based clustering) and topic modeling approach. The dataset... In this study, we uncover the topics of Chinese public cultural activities in 2020 with a two-step short text clustering(self-taught neural networks and graph-based clustering) and topic modeling approach. The dataset we use for this research is collected from 108 websites of libraries and cultural centers, containing over 17,000 articles. With the novel framework we propose, we derive 3 clusters and 8 topics from 21 provinciallevel regions in China. By plotting the topic distribution of each cluster, we are able to shows unique tendencies of local cultural institutes, that is, free lessons and lectures on art and culture, entertainment and service for socially vulnerable groups, and the preservation of intangible cultural heritage respectively. The findings of our study provide decision-making support for cultural institutes, thus promoting public cultural service from a data-driven perspective. 展开更多
关键词 Public culture Short text clustering Topic modeling LDA Big data analysis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部