期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Keyword Extraction Based on tf/idf for Chinese News Document 被引量:24
1
作者 LI Juanzi FAN Qi'na ZHANG Kuo 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期917-921,共5页
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and t... Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, hi- and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved. 展开更多
关键词 keyword extraction keyphrase extraction news keyword
下载PDF
TKES:A Novel System for Extracting Trendy Keywords from Online News Sites
2
作者 Tham Vo Phuc Do 《Journal of the Operations Research Society of China》 EI CSCD 2022年第4期801-816,共16页
As the Smart city trend especially artificial intelligence,data science,and the internet of things has attracted lots of attention,many researchers have created various smart applications for improving people’s life ... As the Smart city trend especially artificial intelligence,data science,and the internet of things has attracted lots of attention,many researchers have created various smart applications for improving people’s life quality.As it is very essential to automatically collect and exploit information in the era of industry 4.0,a variety of models have been proposed for storage problem solving and efficient data mining.In this paper,we present our proposed system,Trendy Keyword Extraction System(TKES),which is designed for extracting trendy keywords from text streams.The system also supports storing,analyzing,and visualizing documents coming from text streams.The system first automatically collects daily articles,then it ranks the importance of keywords by calculating keywords’frequency of existence in order to find trendy keywords by using the Burst Detection Algorithm which is proposed in this paper based on the idea of Kleinberg.This method is used for detecting bursts.A burst is defined as a period of time when a keyword is continuously and unusually popular over the text stream and the identification of bursts is known as burst detection procedure.The results from user requests could be displayed visually.Furthermore,we create a method in order to find a trendy keyword set which is defined as a set of keywords that belong to the same burst.This work also describes the datasets used for our experiments,processing speed tests of our two proposed algorithms. 展开更多
关键词 Event detection Burst detection keyword extraction Kleinberg Burst ranking TKES Text stream
原文传递
Automatic Arabic Document Classification via kNN
3
作者 HANI M. O. Iwidat 《Computer Aided Drafting,Design and Manufacturing》 2008年第2期65-73,共9页
Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The... Many algorithms have been implemented for the problem of document categorization. The majority work in this area was achieved for English text, while a very few approaches have been introduced for the Arabic text. The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging. This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task. In this paper, we present an Automatic Arabic documents classification system based on kNN algorithm. Also, we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method. The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems. The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories. 展开更多
关键词 Arabic documents classification KNN vector model keywords extraction
下载PDF
News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model 被引量:7
4
作者 Ao Xiong Derong Liu +3 位作者 Hongkang Tian Zhengyuan Liu Peng Yu Michel Kadoch 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第6期886-893,共8页
The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extra... The internet is an abundant source of news every day. Thus, efficient algorithms to extract keywords from the text are important to obtain information quickly. However, the precision and recall of mature keyword extraction algorithms need improvement. TextRank, which is derived from the PageRank algorithm, uses word graphs to spread the weight of words. The keyword weight propagation in Text Rank focuses only on word frequency. To improve the performance of the algorithm, we propose Semantic Clustering TextRank(SCTR), a semantic clustering news keyword extraction algorithm based on TextRank. Firstly, the word vectors generated by the Bidirectional Encoder Representation from Transformers(BERT) model are used to perform k-means clustering to represent semantic clustering. Then, the clustering results are used to construct a TextRank weight transfer probability matrix. Finally,iterative calculation of word graphs and extraction of keywords are performed. The test target of this experiment is a Chinese news library. The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision, recall, and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency(TF-IDF) algorithms. 展开更多
关键词 keyword extraction TextR ank SEMANTICS word vector
原文传递
Can prior knowledge help graph-based methods for keyword extraction? 被引量:1
5
作者 Zhiyuan LIU Maosong SUN 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2012年第2期242-253,共12页
Graph-based methods are one of the widely used unsupervised approaches for keyword extraction. In this approach, words are linked according to their co- occurrences within the document. Afterwards, graph-based ranking... Graph-based methods are one of the widely used unsupervised approaches for keyword extraction. In this approach, words are linked according to their co- occurrences within the document. Afterwards, graph-based ranking algorithms are used to rank words and those with the highest scores are selected as keywords. Although graph-based methods are effective for keyword extraction, they rank words merely based on word graph topology. In fact, we have various prior knowledge to identify how likely the words are keywords. The knowledge of words may be frequency-based, position-based, or semantic- based. In this paper, we propose to incorporate prior knowledge with graph-based methods for keyword extraction and investigate the contributions of the prior knowledge. Experiments reveal that prior knowledge can significantly improve the performance of graph-based keyword extraction. Moreover, by combining prior knowl- edge with neighborhood knowledge, in experiments we achieve the best results compared to previous graph-based methods. 展开更多
关键词 keyword extraction prior knowledge PageRank DiffusionRank
原文传递
Deep Neural Semantic Network for Keywords Extraction on Short Text
6
作者 Chundong She Huanying You +5 位作者 Changhai Lin Shaohua Liu Boxiang Liang Juan Jia Xinglei Zhang Yanming Qi 《国际计算机前沿大会会议论文集》 2020年第2期101-112,共12页
Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to ... Keyword extraction is a branch of natural language processing,which plays an important role in many tasks,such as long text classification,automatic summary,machine translation,dialogue system,etc.All of them need to use high-quality keywords as a starting point.In this paper,we propose a deep learning network called deep neural semantic network(DNSN)to solve the problem of short text keyword extraction.It can map short text and words to the same semantic space,get the semantic vector of them at the same time,and then compute the similarity between short text and words to extract top-ranked words as keywords.The Bidirectional Encoder Representations from Transformers was first used to obtain the initial semantic feature vectors of short text and words,and then feed the initial semantic feature vectors to the residual network so as to obtain the final semantic vectors of short text and words at the same vector space.Finally,the keywords were extracted by calculating the similarity between short text and words.Compared with existed baseline models including Frequency,Term Frequency Inverse Document Frequency(TF-IDF)and Text-Rank,the model proposed is superior to the baseline models in Precision,Recall,and F-score on the same batch of test dataset.In addition,the precision,recall,and F-score are 6.79%,5.67%,and 11.08%higher than the baseline model in the best case,respectively. 展开更多
关键词 Semantic similarity Semantic network Short text keywords extraction
原文传递
A Recommendation Mechanism for Web Publishing Based on Sentiment Analysis of Microblog 被引量:2
7
作者 TIAN Pingfang ZHU Zhonghua +1 位作者 XIONG Li XU Fangfang 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2015年第2期146-152,共7页
Microblog is a social platform with huge user community and mass data. We propose a semantic recommendation mechanism based on sentiment analysis for microblog. Firstly, the keywords and sensibility words in this mech... Microblog is a social platform with huge user community and mass data. We propose a semantic recommendation mechanism based on sentiment analysis for microblog. Firstly, the keywords and sensibility words in this mechanism are extracted by natural language processing including segmentation, lexical analysis and strategy selection. Then, we query the background knowledge base based on linked open data (LOD) with the basic information of users. The experiment result shows that the accuracy of recommendation is within the range of 70% -89% with sentiment analysis and semantic query. Compared with traditional recommendation method, this method can satisfy users' requirement greatly. 展开更多
关键词 sentiment analysis microblog keyword extraction linked open data background knowledge base
原文传递
Portraying User Life Status from Microblogging Posts 被引量:1
8
作者 Jiayu Tang Zhiyuan Liu +1 位作者 Maosong Sun Jiahua Liu 《Tsinghua Science and Technology》 SCIE EI CAS 2013年第2期182-195,共14页
Microblogging services nformation and express opinions pro by vide a novel and popular communication scheme for Web users to share publishing short posts, which usually reflect the users' daily life. We can thus mode... Microblogging services nformation and express opinions pro by vide a novel and popular communication scheme for Web users to share publishing short posts, which usually reflect the users' daily life. We can thus model the users' daily status and interests according to their posts. Because of the high complexity and the large amount of the content of the microblog users' posts, it is necessary to provide a quick summary of the users' life status, both for personal users and commercial services. It is non-trivial to summarize the life status of microblog users, particularly when the summary is conducted over a long period. In this paper, we present a compact interactive visualization prototype, LifeCircle, as an efficient summary for exploring the long-term life status of microblog users. The radial visualization provides multiple views for a given microblog user, including annual topics, monthly keywords, monthly sentiments, and temporal trends of posts. We tightly integrate interactive visualization with novel and state-of-the-art microblogging analytics to maximize their advantages. We implement LifeCircle on Sina Weibo, the most popular microblogging service in China, and illustrate the effectiveness of our prototype with various case studies. Results show that our prototype makes users nostalgic and makes them reminiscent about past events, which helps them to better understand themselves and others 展开更多
关键词 text visualization MICROBLOGGING topic model sentiment analysis keyword extraction
原文传递
Spoken dialog summarization system with HAPPINESS/SUFFERING factor recognition
9
作者 Yang-Yen OU Ta-Wen KUAN +2 位作者 Anand PAUL Jhing-Fa WANG An-Chao TSAI 《Frontiers of Computer Science》 SCIE EI CSCD 2017年第3期429-443,共15页
This work presents a spoken dialog summariza- tion system with HAPPINESS/SUFFERING factor recognition. The semantic content is compressed and classified by factor categories from spoken dialog. The transcription of au... This work presents a spoken dialog summariza- tion system with HAPPINESS/SUFFERING factor recognition. The semantic content is compressed and classified by factor categories from spoken dialog. The transcription of au- tomatic speech recognition is then processed through Chinese Knowledge and Information Processing segmentation system. The proposed system also adopts the part-of-speech tags to effectively select and rank the keywords. Finally, the HAPPINESS/SUFFERING factor recognition is done by the proposed point-wise mutual information. Compared with the original method, the performance is improved by applying the significant scores of keywords. The experimental results show that the average precision rate for factor recognition in outside test can reach 73.5% which demonstrates the possi- bility and potential of the proposed system. 展开更多
关键词 spoken dialog summarization keyword extraction natural language processing (NLP) sentiment analysis
原文传递
Identifying User Profile by Incorporating Self-Attention Mechanism based on CSDN Data Set
10
作者 Junru Lu Le Chen +5 位作者 Kongming Meng Fengyi Wang Jun Xiang Nuo Chen Xu Han Binyang Li 《Data Intelligence》 2019年第2期160-175,共16页
With the popularity of social media,there has been an increasing interest in user profiling and its applications nowadays.This paper presents our system named UIR-SIST for User Profiling Technology Evaluation Campaign... With the popularity of social media,there has been an increasing interest in user profiling and its applications nowadays.This paper presents our system named UIR-SIST for User Profiling Technology Evaluation Campaign in SMP CUP 2017.UIR-SIST aims to complete three tasks,including keywords extraction from blogs,user interests labeling and user growth value prediction.To this end,we first extract keywords from a user’s blog,including the blog itself,blogs on the same topic and other blogs published by the same user.Then a unified neural network model is constructed based on a convolutional neural network(CNN)for user interests tagging.Finally,we adopt a stacking model for predicting user growth value.We eventually receive the sixth place with evaluation scores of 0.563,0.378 and 0.751 on the three tasks,respectively. 展开更多
关键词 User profile Convolutional neural network(CNN) Self-attention keyword extraction
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部