摘要
[目的/意义]以主题短语识别为研究对象,提出基于PhraseLDA模型的主题短语挖掘方法,为快速理解文本内容、准确抽取文本主题提供借鉴思路。[方法/过程]对低频词进行量化定义,提出一种合理的短语重要度计算方法,最终利用PhraseLDA主题模型推理出主题短语。[结果/结论]实验结果表明该方法在多种数据集中挖掘出的主题短语质量较高,主题一致性较强。
[ Purpose/significance] Taking topical phrase identifying as the research object, this paper provides a method for topical phrase mining based on the Phrase LDA model to understand text content quickly and extract text topic accurately. [ Method/process] This paper provides a quantitative definition of the high frequency words and puts forward a reasonable formula of the phrase importance. We reason out the topical phrases combining the PhraseLDA model finally. [ Result/conclusion] Experimental result shows that the quality of phrases extracted from a variety of data sets is higher and the topical consistency is stronger.
出处
《图书情报工作》
CSSCI
北大核心
2017年第8期120-125,共6页
Library and Information Service
基金
中国科学院"全院科技信息监测中心建设"项目(项目编号:院1628-4)研究成果之一