期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling 被引量:4
1
作者 Liangping Ding Zhixiong Zhang +2 位作者 Huan Liu Jie Li GaihongYu 《Journal of Data and Information Science》 CSCD 2021年第3期35-57,共23页
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p... Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent. 展开更多
关键词 Automatic keyphrase extraction Character-level sequence labeling Pretrained language model Scientific chinese medical abstracts
下载PDF
Hashtag Recommendation Using LSTM Networks with Self-Attention 被引量:2
2
作者 Yatian Shen Yan Li +5 位作者 Jun Sun Wenke Ding Xianjin Shi Lei Zhang Xiajiong Shen Jing He 《Computers, Materials & Continua》 SCIE EI 2019年第9期1261-1269,共9页
On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend ha... On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend hashtags for tweets has received wide attention.The previous hashtag recommendation methods were to convert the task into a multi-class classification problem.However,these methods can only recommend hashtags that appeared in historical information,and cannot recommend the new ones.In this work,we extend the self-attention mechanism to turn the hashtag recommendation task into a sequence labeling task.To train and evaluate the proposed method,we used the real tweet data which is collected from Twitter.Experimental results show that the proposed method can be significantly better than the most advanced method.Compared with the state-of-the-art methods,the accuracy of our method has been increased 4%. 展开更多
关键词 Hashtags recommendation self-attention neural networks sequence labeling
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部