摘要
在中文关系抽取任务中,数据稀疏和噪声传播问题是其研究难点。基于此,提出了在文本特征组织方面融合位置特征、最短依存特征和N-gram特征等多元特征,并提升关键性特征的权重,以缓解传统词特征的数据稀疏问题。这种组合特征进一步改善了文本中噪声传播问题,提高了句法特征在稀疏性问题下的可靠性。此外,在传统的双向LSTM神经网络中加入注意力机制,使模型更关注较为重要的特征,降低噪声对抽取任务的影响。在人物关系公开语料集上进行实验,结果表明采用该方法进行中文文本关系抽取的效果较好,并为信息抽取、知识图谱等领域提供了方法支持。
At present, data sparsity and noise propagation have become difficult problems in Chinese relational extraction. In order to alleviate the data sparsity problem of traditional word features, we propose to use the fusion of location features, minimum dependency features and N-gram features in text feature organization, and enhance the weight of key features. This combination feature further improves the problem of noise propagation in text, and improves the reliability of syntactic features under sparse problem. In addition, attention mechanism is added to the traditional two-way LSTM neural network to make the model pay more attention to the more important features and reduce the influence of redundant noise on the extraction task. Experiments on the open corpus of human relations show that the proposed method achieves good results in the task of Chinese text relational extraction, and provides methodological support for information extraction, knowledge mapping and other fields.
作者
陈振彬
叶颖雅
冯浩男
李明轩
陈珂
CHEN Zhenbin;YE Yingya;FENG Haonan;LI Mingxuan;CHEN Ke(College ofComputer Science and Technology, Guangdong University of Petrochemical Technology, Maoming 525000, China)
出处
《广东石油化工学院学报》
2019年第4期36-40,共5页
Journal of Guangdong University of Petrochemical Technology
基金
广东省自然科学基金项目(2016A030307049,2018A030307032)
广东省高等院校学科与专业建设专项资金项目(2016KTSCX090)
大学生创新创业训练与培育项目(733013,733435,733437)