期刊文献+

基于词信息嵌入的汉语构词结构识别研究 被引量:1

Chinese Word-Formation Prediction Based on Lexical Level Embedding
下载PDF
导出
摘要 作为一种意合型语言,汉语中的构词结构刻画了构词成分之间的组合关系,是认知、理解词义的关键。在中文信息处理领域,此前的构词结构识别工作大多沿用句法层面的粗粒度标签,且主要基于上下文等词间信息建模,忽略了语素义、词义等词内信息对构词结构识别的作用。该文采用语言学视域下的构词结构标签体系,构建汉语构词结构及相关信息数据集,提出了一种基于BiLSTM和selfattention的模型,以此来探究词内、词间等多方面信息对构词结构识别的潜在影响和能达到的性能。实验取得了良好的预测效果,准确率达77.87%,F_(1)值为78.36%;同时,对比测试揭示,词内的语素义信息对构词结构识别具有显著的贡献,而词间的上下文信息贡献较弱且带有较强的不稳定性。 As a paratactic language, Chinese word-formations designate how the formation components combine to form words and become the key to understand semantics. In Chinese Natural Language Processing, most existing works on word-formation prediction follow the coarse-grained syntactic labels and use inter-word features in the context, regardless of the inner-word features like morphemes and lexical semantics. In this paper, we follow the word-formation labels defined from the linguistic perspective and construct a formation-informed Chinese dataset. We then propose a Bi-LSTM-based model with self-attention to explore how the inner-and inter-word features influence the Chinese word-formation prediction. Experimental results show that our method achieves high accuracy(77.87%) and F1 score(78.36%) on the word-formation task. Comparative analyses further show that morphemes(as an inner-word feature) greatly improve the prediction results, whereas the context(as an inter-word feature) performs the worst and shows strong instability.
作者 郑婳 刘扬 殷雅琦 王悦 代达劢 ZHENG Hua;LIU Yang;YIN Yaqi;WANG Yue;DAI Damai(School of Computer Science,Peking University,Beijing 100871,China;Key Lab of Computational Linguistics(MOE),Peking University,Beijing 100871,China)
出处 《中文信息学报》 CSCD 北大核心 2022年第5期31-40,66,共11页 Journal of Chinese Information Processing
基金 国家自然科学基金(62036001) 国家社会科学基金(18ZDA295)。
关键词 汉语构词结构 词信息 语素 Chinese word-formation word features morphemes
  • 相关文献

参考文献15

二级参考文献99

共引文献206

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部