摘要
为了构建实体关系网络、改进和完善基于概念的信息检索,提出一种不针对特定属性类型的从机读词典中抽取概念实例的属性值信息的方法。首先,通过手工标注和遴选等方式生成初始实体—属性值对集并抽取出粗糙模式实例集;其次,经过对模式实例集的聚类合并和扩充处理得到若干组的模式实例,每一组代表一个属性类型;最后,从词典中抽取出新实体词汇的属性值信息。在模式实例集的处理中引入了同义词扩展和词汇语义相似度计算以提高模式实例的覆盖率。实验中针对《现代汉语规范词典》中的电子领域词汇进行抽取,取得了较好的效果。
This paper presents a method to acquire the attribute value information of conceptual instances from machine-readable dictionary in light to generic attribute types in order to build the network of entity-relationships and to improve and perfect the conceptual-based information retrieval.First,the method generates preliminary entity-attribute value pair sets by means of manual marking and selecting and acquires rough pattern instances set.Secondly,the method obtains several groups of pattern instances by clustering,merging and expanding the pattern instances set,each group represents a type of attribute.Finally,the method acquires the attribute value information of new entity vocabulary from dictionary.When processing pattern instances set the semantic similarity of the vocabulary and synonym extension are introduced to improve the coverage of pattern instances.In experiment the extraction aiming at the vocabulary in electronic field is conducted from the Standard Dictionary of Modern Chinese and the result is good.
出处
《计算机应用与软件》
CSCD
2011年第4期1-3,16,共4页
Computer Applications and Software
基金
国家自然科学基金项目(60873135)
关键词
信息抽取
模式实例
相似度
泛化类型
Information extraction Pattern instance Similarity Generic types