期刊文献+

基于句法语义特征的中文实体关系抽取 被引量:74

Chinese Named Entity Relation Extraction Based on Syntactic and Semantic Features
下载PDF
导出
摘要 作为语义网络和本体的基础,实体关系抽取已被广泛应用于信息检索、机器翻译和自动问答系统中.实体关系抽取的核心问题在于实体关系特征的选择和提取.中文长句的句式较复杂,经常包含多个实体的特点以及数据稀疏问题,给中文关系探测和关系抽取任务带了挑战.为了解决上述问题,提出了一种基于句法语义特征的实体关系抽取方法.通过将2个实体各自的依存句法关系进行组合,获取依存句法关系组合特征,利用依存句法分析和词性标注选择最近句法依赖动词特征.将这2个新特征加入到基于特征的关系探测和关系抽取中,使用支持向量机(support vector machine,SVM)方法,以真实旅游领域文本作为语料进行实验.实验表明,从句法和语义上提取的2个特征能够有效地提高实体关系探测和关系抽取的性能,其准确率、召回率和F1值均优于已有方法.此外,最近句法依赖动词特征非常有效,尤其对数据稀疏的关系类型贡献最大,在关系探测和关系抽取上的性能均优于当前经典的基于动词特征方法. Named entity relations are a foundation of semantic networks and ontology, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. In named entity relationships, relationship feature selection and extraction are two key issues. Characteristics of Chinese long sentences with complicated sentence patterns and many entities, as well as the data sparse problem, bring challenges for Chinese entity relationship detection and extraction tasks. To deal with above problems, a novel method based on syntactic and semantic features is proposed. The feature of dependency relation composition is obtained through the combination of their respective dependency relations between two entities. And the verb feature with the nearest syntactic dependency is captured from dependency relation and POS (part of speech). The above features are incorporated into feature-based relationship detection and extraction using SVM. Evaluation on a real text corpus in tourist domain shows above two features from syntactic and semantic aspects can effectively improve the performance of entity relationship detection and extraction, and outperform previously best-reported systems in terms of precision, recall and F1 value. In addition, the verb feature with nearest syntactic dependency achieves high effectiveness for relationship detection and extraction, especially obtaining the most prominent contribution to the performance improvement of data sparse entity relationships, and significantly outperforms the state-of-the-art based on the verb feature.
出处 《计算机研究与发展》 EI CSCD 北大核心 2016年第2期284-302,共19页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61173146 61562032 61363039 61363010 61462037) 江西省高等学校科技落地计划项目(KJLD12022) 江西省教育厅科技研究项目(GJJ12733 GJJ13249)~~
关键词 关系抽取 关系探测 句法特征 语义特征 支持向量机 relationship extraction relationship detection syntactic feature semantic feature support vector machine (SVM)
  • 相关文献

参考文献21

  • 1徐健,张智雄,吴振新.实体关系抽取的技术方法综述[J].现代图书情报技术,2008(8):18-23. 被引量:54
  • 2车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:115
  • 3Kambhatla N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations [C]//Proc of the ACL 2004 on Interactive Poster and Demonstration Sessions. Stroudsburg, PA: Association for Computational Linguistics, 2004:1-4.
  • 4Zhou G D, Su J, Zhang J, et al. Exploring various knowledge in relation extraction [C]//Proc of the 43rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA.. Association for Computational Linguistics, 2005:427-434.
  • 5奚斌,钱龙华,周国栋,朱巧明,钱培德.语言学组合特征在语义关系抽取中的应用[J].中文信息学报,2008,22(3):44-49. 被引量:16
  • 6郭喜跃,何婷婷,胡小华,陈前军.基于句法语义特征的中文实体关系抽取[J].中文信息学报,2014,28(6):183-189. 被引量:48
  • 7Jiang J, Zhai C X. A systematic exploration of the feature space for relation extraction [C]//Proc of Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT'07). Stroudsburg, PA: Association for Computational Linguistics, 2007:113-120.
  • 8董静,孙乐,冯元勇,黄瑞红.中文实体关系抽取中的特征选择研究[J].中文信息学报,2007,21(4):80-85. 被引量:55
  • 9Chan Y S, Roth D. Exploiting background knowledge for relation extraction [C]//Proc of the 23rd Int Conf on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2010:152-160.
  • 10Sun A, Grishman R, Sekine S. Semi-supervised relation extraction with large-scale word clustering [C]//Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: Association for Computational Linguistics, 2011, 1: 521-529.

二级参考文献92

共引文献222

同被引文献515

引证文献74

二级引证文献546

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部