期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Keyword Extraction Based on tf/idf for Chinese News Document 被引量:24
1
作者 LI Juanzi FAN Qi'na ZHANG Kuo 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期917-921,共5页
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and t... Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, hi- and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved. 展开更多
关键词 keyword extraction keyphrase extraction news keyword
下载PDF
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
2
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling Pretrained language model Scientific chinese medical abstracts
下载PDF
Text Rank for Domain Specific Using Field Association Words 被引量:1
3
作者 Omnia G. El Barbary El Sayed Atlam 《Journal of Computer and Communications》 2020年第11期69-79,共11页
Text Rank is a popular tool for obtaining words or phrases that are important for many Natural Language Processing (NLP) tasks. This paper presents a practical approach for Text Rank domain specific using Field Associ... Text Rank is a popular tool for obtaining words or phrases that are important for many Natural Language Processing (NLP) tasks. This paper presents a practical approach for Text Rank domain specific using Field Association (FA) words. We present the keyphrase separation technique not for a single document, although for a particular domain. The former builds a specific domain field. The second collects a list of ideal FA terms and compounds FA terms from the specific domain that are considered to be contender keyword phrases. Therefore, we combine two-word node weights and field tree relationships into a new approach to generate keyphrases from a particular domain. Studies using the changed approach to extract key phrases demonstrate that the latest techniques including FA terms are stronger than the others that use normal words and its precise words reach 90%. 展开更多
关键词 Text Rank Keyphrase extraction Field Association Words Information Retrieval
下载PDF
User Profiling for CSDN:Keyphrase Extraction,User Tagging and User Growth Value Prediction
4
作者 Guoliang Xing Hao Gao +4 位作者 Qi Cao Xinyu Yue Bingbing Xu Keting Cen Huawei Shen 《Data Intelligence》 2019年第2期137-159,共23页
The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.... The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams. 展开更多
关键词 User profiling Keyphrase extraction User tagging Growth value prediction Word embedding
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部