期刊文献+

基于长时间跨度语料的词义演变计算研究 被引量:1

A Study on Semantic Evolution Computation with Diachronic Corpus
下载PDF
导出
摘要 该文收集了自晚清到21世纪间长达144年的连续历时报刊语料,通过统计分析和词语分布式表示两类方法展开研究,计算并辅助识别汉语词语的词义历时演变现象。采用TF-IDF、词频比例等多种统计分析的评价指标和目标词语在文段中的共现实词及其重合度挖掘出现词义演变的词语。针对历时语料上不同时间段的词向量对齐,采用SGNS训练词向量加正交矩阵投影、SGNS递增训练和"锚点词"二阶词向量表示三种方法,其中以SGNS递增训练效果最佳。针对自动发现的词义演变现象,采用目标词历时自相似度和锚点词历时相似度的分析方法,并利用近邻词来明确目标词变迁前后的词义。 This paper collected a diachronic corpus of Chinese newspapers and periodicals for the past 144 years dated back to the late Qing Dynasty.A study on word semantic evolution computation is conducted for Chinese via statistical analysis and word distributed representation.Chinese word with potential semantic evolution is first discovered by context overlapping of content words via TF-IDF,word frequency ratio and other statistical indicators.Then,to align the word embeddings derived from corpus of different time periods,three methods are examined:orthogonal matrix alignment after SGNS training,second-order word vector representation and SGNS incremental training(which bears top performance).Finally,the word semantic evolution is identified by the diachronic self-similarity of the candidate word and the diachronic similarity of anchor words,with neighboring words as the description of the word meaning in the evolution.
作者 孙琦鑫 饶高琦 荀恩东 SUN Qixin;RAO Gaoqi;XUN Endong(School of Information Science,Beijing Language and Culture University,Beijing 100083,China;Beijing Advanced Innovation Center for Language Resources,Beijing Language and Culture University,Beijing 100083,China;Institute of International Chinese Language Education,Beijing Language and Culture University,Beijing 100083,China)
出处 《中文信息学报》 CSCD 北大核心 2020年第8期10-22,共13页 Journal of Chinese Information Processing
基金 教育部人文社科基金(20YJC740050) 北京语言大学青年英才培养计划(1090/501321102) 北京语言大学中央高校基本科研业务费(19YJ130005)。
关键词 词义演变 历时语料 分布式表示 word semantic evolution:diachronic corpus distributed representation
  • 相关文献

二级参考文献108

共引文献67

同被引文献22

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部