摘要
文本关键词的抽取作为文本摘要、文本检索以及文本挖掘任务的基础工作,在自然语言领域得到广泛应用。通过对文本关键词抽取方法和研究现状的详述,将文本关键词抽取方法分为传统文本关键词抽取方法和基于深度学习的文本关键词抽取方法,并对比分析各类方法的基本思想和优缺点,归纳了文本关键词抽取方法的评价指标。进一步调研了民族语言关键词抽取的研究现状,阐述处理民族语言文字时存在的困难,总结了几种民族语言如蒙古文、藏文、维吾尔文的关键词抽取技术和应用场景。
The text keyword extraction is widely used as the basic work of text summarization,text retrieval and text mining tasks in the field of natural language.Through analyzing the research status of text keyword extraction methods in detail,the extraction methods were divided into traditional and based on deep learning two categories.The comparative analysis of the basic ideas of various methods,as well as the advantages and disadvantages of them,concluded some evaluation standards for the text keyword extraction method.Further investigation for research status of the key words extraction in ethnic languages revealed the difficulties in processing ethnic languages at present.We also obtained some key word extraction techniques and presented application scenarios for several ethnic languages,such as Mongol,Tibetan,and Uighur in the paper.
作者
白曙光
林民
李艳玲
张树钧
BAI Shu-guang;LIN Min;LI Yan-ling;ZHANG Shu-jun(College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,China)
出处
《内蒙古师范大学学报(自然科学版)》
CAS
2021年第2期134-144,共11页
Journal of Inner Mongolia Normal University(Natural Science Edition)
基金
国家自然科学基金资助项目(61806103,61562068)
内蒙古自治区“草原英才”工程青年创新创业人才资助项目
内蒙古自治区科技计划资助项目(JH20180175)
国家242课题资助项目(2019A114)
内蒙古师范大学研究生创新基金资助项目(CXJJS19151)。
关键词
抽取
深度学习
LDA主题模型
TextRank
extraction
deep learning
Latent Dirichlet Allocation(LDA) topic model
TextRank