摘要
针对实体关系的自动获取难题,将极大熵算法和Bootstrapping算法相结合,利用Bootstrapping算法和标量聚类的思想,通过设置种子模板和种子词获取了极大熵算法中所需的特征词.结合极大熵算法,从语言的形态学、语法、语义等方面系统地设计了9个特征,尽可能全方位地描述文实体的真实情况.搭建了实验所需的系统框架,实现了实体关系的自动抽取.实验结果表明:该方法能够有效地解决实体关系的自动生成问题.
Entity Relation Extraction is solved in this paper. This approach is very different from previous one; the Maximum Entropy (ME)-based machine learning is combined with the Bootstrapping algorithm. Based on the Bootstrapping algorithm, seed words and seed patterns are used to build a learning program, which extracts more characteristic words using Scalar Clusters as the important feature of ME algorithm. These characteristic words have semantic similarity with seed words. Moreover, combined the ME algorithm, nine features have been designed for entity relation extraction in this paper, which include morphology, grammar and semantic feature, etc. The system architecture used for entity relation extraction has been constructed. Experiment shows that the performance is promising. So it is useful to extract automatic entity relation.
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2006年第B07期370-373,共4页
Journal of Harbin Engineering University
基金
国家863计划计算机主题重大基金资助项目(2001AA114210).