摘要
面对舆情信息的动态性、跨领域性、面向主题等特点,目前已有的领域本体学习方法难以适应舆情本体知识的自动构建.本文采用信息爬取技术收集热点舆情文本信息,根据构建模型自动识别主题进行归类,并对识别后的主题文本提取名词性词汇或者短语作为候选概念集;采用语义相似度方法计算候选概念间的相关度,根据相关度计算各概念的权值,并进行排序;结合词频变化的方法抽取与主题相关的核心概念.实验结果表明,本文方法可有效地抽取与舆情主题相关的核心概念,对舆情本体的构建以及后期的知识共享和重用起到积极作用.
According to the characteristics of public opinion, which is dynamic, interdisciplinary, topicoriented etc, the existing domain ontology learning method is difficult to adapt to the public opinion ontology knowledge of the automated build. In this paper, we use the information crawling technology to collect public opinion texts, automatic identify and classify the theme according to build model. We will extract the nominal words or phrases as the concept of candidate set; we use the semantic similarity method provided to calculate the correlation between candidate concepts, to calculate the weights of the concept and sorting;we combine the word frequency statistics method to extract the core concepts related to the topic. The experimental results show that this method can effectively extract the core concepts related to the subject matter and the public opinion, and play a positive role in building the public opinion ontology, as well as knowledge sharing and reusing in the late.
出处
《新疆大学学报(自然科学版)》
CAS
北大核心
2016年第3期333-337,共5页
Journal of Xinjiang University(Natural Science Edition)
基金
新疆维吾尔自治区科学基金项目(2014211A016)
关键词
舆情本体
概念抽取
词语相似度
词频统计
public opinion ontology
concept extraction
word similarity
word frequency statistics