摘要
事件抽取是自然语言处理(Natural Language Processing,NLP)领域的一个研究热点。现有的事件抽取模型大多基于小规模训练集,无法应用于大规模开放领域。针对大规模开放域事件抽取中事件表征困难的问题,提出了一种基于Zipf’s共生矩阵分解的事件向量计算方法。首先,从开放语料中提取事件元组作为事件标签,并对事件元组进行抽象、剪枝和消歧。然后,利用Zipf’s共生矩阵表示事件的上下文分布,利用主成分分析(Principal Component Analysis,PCA)对共生矩阵进行分解,得到初始事件向量,并利用自编码器对初始事件向量进行非线性变换。采用最近邻检测和事件检测两种任务对事件向量的性能进行测试,结果表明,基于Zipf’s共生矩阵分解得到的事件向量能够对事件之间的相似性和相关性信息进行全局性表征,避免编码过细而造成语义偏移。
Event extraction is one of the hot topics of natural language processing(NLP).Existing event extraction models are mostly trained on small-scale corpora and are unable to be applied to open domain event extraction.To alleviate the difficulty of event representation in large-scale open domain event extraction,we propose a method for event embedding based on Zipf’s co-occurrence matrix factorization.We firstly extract event tuples from large-scale open domain corpora and then proceed with tuple abstraction,pruning and disambiguation.We use Zipf’s co-occurrence matrix to represent the context distribution of events.The built co-occurrence matrix is then factorized by principal component analysis(PCA) to generate event vectors.Finally,we construct an autoencoder to transform the vectors nonlinearly.We test the generated vectors on the task of nearest neighbors and event identification.The experimental results prove that our method can capture the information of event similarity and relativity globally and avoids the semantic deviation caused by the too fine granularity of encoding.
作者
高李政
周刚
黄永忠
罗军勇
王树伟
GAO Li-zheng;ZHOU Gang;HUANG Yong-zhong;LUO Jun-yong;WANG Shu-wei(State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China;School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541000,China)
出处
《计算机科学》
CSCD
北大核心
2020年第10期207-214,共8页
Computer Science
基金
国家自然科学基金(61602508,61866008)。
关键词
开放域事件抽取
Zipf’s共生矩阵
上下文分布
事件表征
Open domain event extraction
Zipf’s co-occurrence matrix
Context distribution
Event representation