摘要
针对传统共词分析中的不足,提出一个新的共词分析过程模型,该模型从两个方面对传统共词分析方法进行改进。首先,自标引关键词不能全面描述论文主题内容,需对其进行增补。选择高频自标引关键词构成增补词典,利用基于增补词典的分词技术从标题中提取论文候选关键词,按一定规则进行增补。其次,针对共现频次较难准确描述词对相似度,引入领域本体来计算高频关键词对的语义相似度,综合考虑共现频次和语义相似度值得到词对的相关度值。用相关度来描述词对相似度,并作为构建共词矩阵的依据。最后通过实验证明改进方法的有效性。
This paper puts forward a new co -word analysis process model according to the deficiency in tradition co - word analysis. This model improves the traditional methods of co - word analysis from two aspects. At first, this paper supplements the indexing keywords because they cannot fully describe the topic content of the thesis. High frequency words from indexing key words are chosen to constitute a supplementary dictionary. Paper candidate keywords are extracted from the title by the word segmentation technology based on the supplement dictionary, and then the candidate keywords are supplemented according to certain rules. Secondly, domain Ontology is introduced to calculate the high frequency key- words for semantic similarity because the co - occurrence frequencies are difficult to accurately describe the similarity be- tween two words, considering the co -occurrence frequency and semantic similarity. Then the correlation is used to describe the word similarity, and is the basis of building co - word matrix. Finally, experiments prove the effectiveness of this improved method.
出处
《现代图书情报技术》
CSSCI
北大核心
2013年第11期60-67,共8页
New Technology of Library and Information Service
基金
国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)的研究成果之一
关键词
共词分析
增补词典
领域本体
Co- word analysis Extension dictionary Domain Ontology