摘要
本文以科技文献中共现关键词-叙词(下称"共现词对")为挖掘对象,首先,以字面相似法统计字面相似值,以最高相似值方式认定每个叙词同义对应的唯一关键词(起点词);然后使用概率法计算词对的共现强弱,并以叙词为单位按共现紧密度由高到低排列,以起点词作为参考词,根据"同义相斥"、"相关相吸"的共现规律,按照一定的原则,逐个层层深入识别所有共现词对。实验结果表明,基于层层深入法的识别方式,是共现关键词-叙词同义关系识别的可靠、稳定的方法。
In the paper, a new method was proposed to automatic recognition of synonymous about keyword-descriptor co-occurrence (co-words for short).The study was made in 3 fundamental steps as follows: 1. The only keywords, called starting word, corresponding to its descriptor, were Identified by word-form-similarity approach. 2. The tight co-words, which was divided and Ranged from high to low by the statistical value of probability method and frequency of co-words. 3. Under the descriptor-centered, Synonymous keywords will be discriminated by starting word according to the co-words principle of "repulsion between synonymous keywords" and "attraction between Semantic Relatedness keywords". The results show that the new recognition method is hopeful to become a stabled and reliable method to the synonymous recognition between keywords and descriptor.
出处
《情报科学》
CSSCI
北大核心
2013年第4期84-88,共5页
Information Science
基金
教育部人文社会科学研究一般项目(10YJC870051)
广东高校优秀青年创新人才培育项目(wym09089)
关键词
同义词识别
共现关键词-叙词
字面相似
概率法
synonymous recognition
keyword-descriptor co-occurrence
word-form-similarity
probability method