期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
1
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling Pretrained language model Scientific chinese medical abstracts
下载PDF
Keyphrase Generation Based on Self-Attention Mechanism
2
作者 Kehua Yang Yaodong Wang +2 位作者 Wei Zhang Jiqing Yao Yuquan Le 《Computers, Materials & Continua》 SCIE EI 2019年第8期569-581,共13页
Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generati... Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods. 展开更多
关键词 keyphrase generation self-attention mechanism encoder-decoder framework
下载PDF
User Profiling for CSDN:Keyphrase Extraction,User Tagging and User Growth Value Prediction
3
作者 Guoliang Xing Hao Gao +4 位作者 Qi Cao Xinyu Yue Bingbing Xu Keting Cen Huawei Shen 《Data Intelligence》 2019年第2期137-159,共23页
The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.... The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams. 展开更多
关键词 User profiling keyphrase extraction User tagging Growth value prediction Word embedding
原文传递
Keyword Extraction Based on tf/idf for Chinese News Document 被引量:24
4
作者 LI Juanzi FAN Qi'na ZHANG Kuo 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期917-921,共5页
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and t... Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, hi- and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved. 展开更多
关键词 keyword extraction keyphrase extraction news keyword
下载PDF
Text Rank for Domain Specific Using Field Association Words 被引量:1
5
作者 Omnia G. El Barbary El Sayed Atlam 《Journal of Computer and Communications》 2020年第11期69-79,共11页
Text Rank is a popular tool for obtaining words or phrases that are important for many Natural Language Processing (NLP) tasks. This paper presents a practical approach for Text Rank domain specific using Field Associ... Text Rank is a popular tool for obtaining words or phrases that are important for many Natural Language Processing (NLP) tasks. This paper presents a practical approach for Text Rank domain specific using Field Association (FA) words. We present the keyphrase separation technique not for a single document, although for a particular domain. The former builds a specific domain field. The second collects a list of ideal FA terms and compounds FA terms from the specific domain that are considered to be contender keyword phrases. Therefore, we combine two-word node weights and field tree relationships into a new approach to generate keyphrases from a particular domain. Studies using the changed approach to extract key phrases demonstrate that the latest techniques including FA terms are stronger than the others that use normal words and its precise words reach 90%. 展开更多
关键词 Text Rank keyphrase Extraction Field Association Words Information Retrieval
下载PDF
Detecting geo-relation phrases from web texts for triplet extraction of geographic knowledge:a context-enhanced method 被引量:1
6
作者 Peiyuan Qiu Li Yu +1 位作者 Jialiang Gao Feng Lu 《Big Earth Data》 EI 2019年第3期297-314,共18页
As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge d... As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge determines the quality of GeoKG,thus drawing considerable attention in the related domains.Mass unstructured geographic knowledge scattered in web texts has been regarded as a potential source for enriching the triplets in GeoKGs.The crux of triplet extraction from web texts lies in the detection of key phrases indicating the correct geo-relations between geo-entities.However,the current methods for key-phrase detection are ineffective because the sparseness of the terms in the web texts describing geo-relations results in an insufficient training corpus.In this study,an unsupervised context-enhanced method is proposed to detect geo-relation key phrases from web texts for extracting triplets.External semantic knowledge is introduced to relieve the influence of the sparseness of the georelation description terms in web texts.Specifically,the contexts of geo-entities are fused with category semantic knowledge and word semantic knowledge.Subsequently,an enhanced corpus is generated using frequency-based statistics.Finally,the geo-relation key phrases are detected from the enhanced contexts using the statistical lexical features from the enhanced corpus.Experiments are conducted with real web texts.In comparison with the well-known frequency-based methods,the proposed method improves the precision of detecting the key phrases of the geo-relation description by approximately 20%.Moreover,compared with the well-defined geo-relation properties in DBpedia,the proposed method provides quintuple key-phrases for indicating the geo-relations between geo-entities,which facilitate the generation of new triplets from web texts. 展开更多
关键词 Geographic knowledge graph triplet extraction geo-entity relation keyphrase detection context enhancement
原文传递
The State of the Art of Natural Language Processing-A Systematic Automated Review of NLP Literature Using NLP Techniques
7
作者 Jan Sawicki Maria Ganzha Marcin Paprzycki 《Data Intelligence》 EI 2023年第3期707-749,共43页
Nowadays,natural language processing(NLP)is one of the most popular areas of,broadly understood,artificial intelligence.Therefore,every day,new research contributions are posted,for instance,to the arXiv repository.He... Nowadays,natural language processing(NLP)is one of the most popular areas of,broadly understood,artificial intelligence.Therefore,every day,new research contributions are posted,for instance,to the arXiv repository.Hence,it is rather difficult to capture the current"state of the field"and thus,to enter it.This brought the id-art NLP techniques to analyse the NLP-focused literature.As a result,(1)meta-level knowledge,concerning the current state of NLP has been captured,and(2)a guide to use of basic NLP tools is provided.It should be noted that all the tools and the dataset described in this contribution are publicly available.Furthermore,the originality of this review lies in its full automation.This allows easy reproducibility and continuation and updating of this research in the future as new researches emerge in the field of NLP. 展开更多
关键词 Natural language processing Text processing Literature survey Keyword search keyphrase search Text embeddings Text summarizations
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部