期刊文献+
共找到1,207篇文章
< 1 2 61 >
每页显示 20 50 100
Fuzzy c-means text clustering based on topic concept sub-space 被引量:3
1
作者 吉翔华 陈超 +1 位作者 邵正荣 俞能海 《Journal of Southeast University(English Edition)》 EI CAS 2007年第3期439-442,共4页
To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con... To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision. 展开更多
关键词 TCS2FCM topic concept space fuzzy c-means clustering text clustering
下载PDF
Ontology-based similarity measure for text clustering 被引量:1
2
作者 颜端武 李晓鹏 +1 位作者 王磊 成晓 《Journal of Southeast University(English Edition)》 EI CAS 2006年第3期389-393,共5页
A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywor... A method that combines category-based and keyword-based concepts for a better information retrieval system is introduced. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. The ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. Two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described. 展开更多
关键词 similarity measure text clustering ONTOLOGY information retrieval system
下载PDF
Agricultural Ontology Based Feature Optimization for Agricultural Text Clustering 被引量:4
3
作者 SU Ya-ru WANG Ru-jing +3 位作者 CHEN Peng WEI Yuan-yuan LI Chuan-xi HU Yi-min 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2012年第5期752-759,共8页
Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: thecurse of dimension... Feature optimization is important to agricultural text mining. Usually, the vector space model is used to represent text documents. However, this basic approach still suffers from two drawbacks: thecurse of dimension and the lack of semantic information. In this paper, a novel ontology-based feature optimization method for agricultural text was proposed. First, terms of vector space model were mapped into concepts of agricultural ontology, which concept frequency weights are computed statistically by term frequency weights; second, weights of concept similarity were assigned to the concept features according to the structure of the agricultural ontology. By combining feature frequency weights and feature similarity weights based on the agricultural ontology, the dimensionality of feature space can be reduced drastically. Moreover, the semantic information can be incorporated into this method. The results showed that this method yields a significant improvement on agricultural text clustering by the feature optimization. 展开更多
关键词 agricultural ontology feature optimization agricultural text clustering
下载PDF
A New Feature Selection Method for Text Clustering 被引量:3
4
作者 XU Junling XU Baowen +2 位作者 ZHANG Weifeng CUI Zifeng ZHANG Wei 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期912-916,共5页
Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method... Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method. 展开更多
关键词 feature selection text clustering unsupervised learning data preprocessing
下载PDF
An Incremental Algorithm of Text Clustering Based on Semantic Sequences 被引量:1
5
作者 FENG Zhonghui SHEN Junyi BAO Junpeng 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1340-1344,共5页
This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate clu... This paper proposed an incremental textclustering algorithm based on semantic sequence. Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm. The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set. 展开更多
关键词 text clustering semantic sequence ENTROPY
下载PDF
Similarity matrix-based K-means algorithm for text clustering
6
作者 曹奇敏 郭巧 吴向华 《Journal of Beijing Institute of Technology》 EI CAS 2015年第4期566-572,共7页
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo... K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable. 展开更多
关键词 text clustering K-means algorithm similarity matrix F-MEASURE
下载PDF
The Refinement Algorithm Consideration in Text Clustering Scheme Based on Multilevel Graph
7
作者 CHENJian-bin DONGXiang-jun SONGHan-tao 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期671-675,共5页
To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of ... To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining. 展开更多
关键词 text clustering multilevel coarsen graph model refinement algorithm high-dimensional clustering
下载PDF
FICW: Frequent Itemset Based Text Clustering with Window Constraint
8
作者 ZHOU Chong LU Yansheng ZOU Lei HU Rong 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1345-1351,共7页
Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the s... Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. 展开更多
关键词 text clustering frequent itemsets search engine
下载PDF
Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data
9
作者 Yu Jiang Dengwen Yu +3 位作者 Mingzhao Zhao Hongtao Bai Chong Wang Lili He 《Computers, Materials & Continua》 SCIE EI 2020年第7期207-216,共10页
Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning.This paper implements and compares unsupervised and semi-supervise... Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning.This paper implements and compares unsupervised and semi-supervised clustering analysis of BOA-Argo ocean text data.Unsupervised K-Means and Affinity Propagation(AP)are two classical clustering algorithms.The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range.Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition,and use this data for semi-supervised cluster analysis.Several semi-supervised clustering algorithms were chosen for comparison of learning performance:Constrained-K-Means,Seeded-K-Means,SAP(Semi-supervised Affinity Propagation),LSAP(Loose Seed AP)and CSAP(Compact Seed AP).In order to adapt the single label,this paper improves the above algorithms to SCKM(improved Constrained-K-Means),SSKM(improved Seeded-K-Means),and SSAP(improved Semi-supervised Affinity Propagationg)to perform semi-supervised clustering analysis on the data.A DSAP(Double Seed AP)semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect.The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data. 展开更多
关键词 Unsupervised learning semi-supervised learning text clustering
下载PDF
Using ontology semantics to improve text documents clustering 被引量:8
10
作者 罗娜 左万利 +2 位作者 袁福宇 张靖波 张慧杰 《Journal of Southeast University(English Edition)》 EI CAS 2006年第3期370-374,共5页
In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing ... In order to improve the clustering results and select in the results, the ontology semantic is combined with document clustering. A new document clustering algorithm based WordNet in the phrase of document processing is proposed. First, every word vector by new entities is extended after the documents are represented by tf-idf. Then the feature extracting algorithm is applied for the documents. Finally, the algorithm of ontology aggregation clustering (OAC) is proposed to improve the result of document clustering. Experiments are based on the data set of Reuters 20 News Group, and experimental results are compared with the results obtained by mutual information(MI). The conclusion draws that the proposed algorithm of document clustering based on ontology is better than the other existed clustering algorithms such as MNB, CLUTO, co-clustering, etc. 展开更多
关键词 ONTOLOGY text clustering LEXICON WORDNET
下载PDF
A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE 被引量:2
11
作者 Haoxiang XIA Shuguang WANG Taketoshi YOSHIDA 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2006年第4期474-492,共19页
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontol... Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm. 展开更多
关键词 Ant-based clustering text clustering ant movement rule semantic similarity measure
原文传递
Text clustering based on fusion of ant colony and genetic algorithms
12
作者 Yun ZHANG Boqin FENG +1 位作者 Shouqiang MA Lianmeng LIU 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2009年第1期15-19,共5页
Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space,a text clustering approach based on the fusion of the ant colony and genetic algorithms is pr... Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space,a text clustering approach based on the fusion of the ant colony and genetic algorithms is proposed.The four parameters that influence the performance of the ant colony algorithm are encoded as chromosomes,thereby the fitness function,selection,crossover and mutation operator are designed to find the combination of optimal parameters through a number of iteration,and then it is applied to text clustering.The simulation results show that compared with the classical k-means clustering and the basic ant colony clustering algorithm,the proposed algorithm has better performance and the value of F-Measure is enhanced by 5.69%,48.60%and 69.60%,respectively,in 3 test datasets.Therefore,it is more suitable for processing a larger dataset. 展开更多
关键词 ant colony clustering genetic algorithm FUSION text clustering
原文传递
The research and realization about automatic abstracting based on text clustering and natural language understanding
13
作者 GUO Qing-lin FAN Xiao-zhong LIU Chang-an 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2006年第4期460-464,共5页
A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering ... A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering and can realize automatic abstracting of multi-documents. The algo- rithm of twice word segmentation based on the title and first sentences in paragraphs is investigated. Its precision and recall is above 95 %. For a specific domain on plastics, an automatic abstracting system named TCAAS is implemented. The precision and recall of multi-document’s automatic ab- stracting is above 75 %. Also, the experiments prove that it is feasible to use the method to develop a domain automatic abstracting system, which is valuable for further in-depth study. 展开更多
关键词 automatic abstracting text clustering natural language understanding
原文传递
News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark 被引量:18
14
作者 Zhuo Zhou Jiaohua Qin +3 位作者 Xuyu Xiang Yun Tan Qiang Liu Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2020年第1期217-231,共15页
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm... Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data,this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform.Since the TF-IDF(term frequency-inverse document frequency)algorithm under Spark is irreversible to word mapping,the mapped words indexes cannot be traced back to the original words.In this paper,an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored.Firstly,the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper,and then the features are inputted to the LDA(Latent Dirichlet Allocation)topic model for training.Finally,the text topic clustering is obtained.Experimental results show that for large data samples,the processing speed of LDA topic model clustering has been improved based Spark.At the same time,compared with the LDA topic model based on word frequency input,the model proposed in this paper has a reduction of perplexity. 展开更多
关键词 News text topic clustering spark platform countvectorizer algorithm TF-IDF algorithm latent dirichlet allocation model
下载PDF
Genetic-Frog-Leaping Algorithm for Text Document Clustering 被引量:1
15
作者 Lubna Alhenak Manar Hosny 《Computers, Materials & Continua》 SCIE EI 2019年第9期1045-1074,共30页
In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from lar... In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from large collections of data,and particularly documents,has become more necessary and challenging.Text clustering is such a technique;it consists in dividing a set of text documents into clusters(groups),so that documents within the same cluster are closely related,whereas documents in different clusters are as different as possible.Clustering depends on measuring the content(i.e.,words)of a document in terms of relevance.Nevertheless,as documents usually contain a large number of words,some of them may be irrelevant to the topic under consideration or redundant.This can confuse and complicate the clustering process and make it less accurate.Accordingly,feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features.In this study,we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features.The proposed approach is based on two metaheuristic algorithms:a genetic algorithm(GA)and a shuffled frog-leaping algorithm(SFLA).The GA performs feature selection,and the SFLA performs clustering.To evaluate its effectiveness,the proposed approach was tested on a well-known text document dataset:the“20Newsgroup”dataset from the University of California Irvine Machine Learning Repository.Overall,after multiple experiments were compared and analyzed,it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering,compared with classical K-means clustering.Nevertheless,this improvement requires longer computational time. 展开更多
关键词 text documents clustering meta-heuristic algorithms shuffled frog-leaping algorithm genetic algorithm feature selection
下载PDF
Concept Association and Hierarchical Hamming Clustering Model in Text Classification
16
作者 SuGui-yang LiJian-hua MaYing-hua LiSheng-hong YinZhong-hang 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第3期339-342,共4页
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to r... We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification. 展开更多
关键词 text classification concept association hierarchical clustering hamming clustering
下载PDF
Visualization of Special Features in “The Tale of Genji” by Text Mining and Correspondence Analysis with Clustering
17
作者 Hisako Hosoi Takayuki Yamagata +1 位作者 Yuya Ikarashi Nobuyuki Fujisawa 《Journal of Flow Control, Measurement & Visualization》 2014年第1期1-6,共6页
In this paper, visualization of special features in “The Tale of Genji”, which is a typical Japanese classical literature, is studied by text mining the auxiliary verbs and examining the similarity in the sentence s... In this paper, visualization of special features in “The Tale of Genji”, which is a typical Japanese classical literature, is studied by text mining the auxiliary verbs and examining the similarity in the sentence style by the correspondence analysis with clustering. The result shows that the text mining error in the number of auxiliary verbs can be as small as 15%. The extracted feature in this study supports the multiple authors of “The Tale of Genji”, which agrees well with the result by Murakami and Imanishi [1]. It is also found that extracted features are robust to the text mining error, which suggests that the classification error is less affected by the text mining error and the possible use of this technique for further statistical study in classical literatures. 展开更多
关键词 VISUALIZATION SCIENTIFIC Art The TALE of GENJI text Mining CORRESPONDENCE Analysis clusterING
下载PDF
基于LDA和TextCNN的跨平台网络舆情风险预警研究
18
作者 管雨翔 王娟 +1 位作者 兰月新 张鹏 《情报探索》 2024年第10期109-115,共7页
[目的/意义]分析多个社交平台上的网络舆情数据,评估网络舆情风险,并进行风险预警研究,具有重要的社会意义和实际价值。[方法/过程]先构建网络舆情风险指标体系,然后使用层次分析法确定指标权重,以此构建网络舆情风险预警模型。实证部... [目的/意义]分析多个社交平台上的网络舆情数据,评估网络舆情风险,并进行风险预警研究,具有重要的社会意义和实际价值。[方法/过程]先构建网络舆情风险指标体系,然后使用层次分析法确定指标权重,以此构建网络舆情风险预警模型。实证部分使用某一地级市的网络舆情数据进行分析,先使用LDA对微博平台上的数据进行主题聚类,再根据聚类后的数据使用TextCNN对其余社交平台数据进行分类,最后使用网络舆情风险预警模型对各主题舆情进行研究。[结果/结论]本文构建的网络舆情风险预警模型具有一定的准确性和有效性。本文的网络舆情风险预警模型可以提供信息支持从而提高决策效率和网络舆情风险的监测效率。 展开更多
关键词 网络舆情 风险预警 主题聚类 文本分类
下载PDF
一种基于TextRank的文本二次聚类算法 被引量:3
19
作者 潘晓英 胡开开 朱静 《计算机技术与发展》 2016年第8期7-11,共5页
针对传统文本聚类技术中存在的聚类精度一般或者运算时间复杂度过高等问题,文中首先介绍了两种较为常用的文本聚类技术:基于划分的K-means和基于主题模型的LDA。在分析各自缺陷的基础上,提出一种基于TextRank的文本二次聚类算法。该算... 针对传统文本聚类技术中存在的聚类精度一般或者运算时间复杂度过高等问题,文中首先介绍了两种较为常用的文本聚类技术:基于划分的K-means和基于主题模型的LDA。在分析各自缺陷的基础上,提出一种基于TextRank的文本二次聚类算法。该算法借鉴主题模型的思想,在传统的聚类过程中引入词聚类,并在关键词提取阶段融合词语的位置与跨度特征,减少了由局部关键词作为全局关键词带来的误差。实验结果表明,改进后的算法在聚类效果上要优于传统的VSM聚类和基于主题模型的LDA算法。 展开更多
关键词 文本聚类 textRank 关键词提取 向量空间模型 LDA
下载PDF
Hierarchical clustering based on single-pass for breaking topic detection and tracking 被引量:3
20
作者 Li Fenghuan Zhao Zongfei Wang Zhenyu 《High Technology Letters》 EI CAS 2018年第4期369-377,共9页
Single-pass is commonly used in topic detection and tracking( TDT) due to its simplicity,high efficiency and low cost. When dealing with large-scale data,time cost will increase sharply and clustering performance will... Single-pass is commonly used in topic detection and tracking( TDT) due to its simplicity,high efficiency and low cost. When dealing with large-scale data,time cost will increase sharply and clustering performance will be affected greatly. Aiming at this problem,hierarchical clustering algorithm based on single-pass is proposed,which is inspired by hierarchical and concurrent ideas to divide clustering process into three stages. News reports are classified into different categories firstly.Then there are twice single-pass clustering processes in the same category,and one agglomerative clustering among different categories. In addition,for semantic similarity in news reports,topic model is improved based on named entities. Experimental results show that the proposed method can effectively accelerate the process as well as improve the performance. 展开更多
关键词 TOPIC detection and tracking(TDT) single-pass HIERARCHICAL clusterING text clusterING TOPIC modeling
下载PDF
上一页 1 2 61 下一页 到第
使用帮助 返回顶部