摘要
文章根据词义消歧即将词义回归语境这一特性,提出了一种基于节点词全句共现的动态词义消歧方法。该方法首先以全句为窗口限定节点词的使用语境,其次使用互信息(MI)、卡方检验(χ^(2)检验)和相对词序比(RRWR)等统计方法抽取节点词的语义相关词,并参照《同义词词林》构建相关词语义范畴库,最后以共现频数作为加权系数,依靠单义词语义聚类分布率对中低频共现多义词进行消歧。采用该方法对与“美丽”共现的1030个小于7义类的多义词进行消歧的测试试验中取得了85.2%的正确率。
Based on the property that word sense disambiguation is the return of word sense to context,we propose a dynamic word sense disambiguation method based on full-sentence co-occurrence of node word.The method firstly uses the full sentence as a window to limit the node word usage context,secondly uses statistical methods such as mutual information,chi-square test and ratio of relative word rank to extract semantically related words,and builds a related semantic category database by referring to“Tongyici Cilin”(A Dictionary of Synonyms),and finally uses the co-occurrence frequency as a weighting factor to disambiguate the low and medium frequency co-occurring multisense words by relying on the distribution rate of single-sense word meaning clusters.The method is used to disambiguate 1030multiple-meaning words with less than 7meaning categories that co-occurred with“meili”(beautiful),and a correct rate of 85.2%is achieved in the test.
作者
闫亚亚
邢红兵
Yan Yaya;Xing Hongbing(College of Chinese Language and Culture,Jinan University,Guangzhou Guangdong 510610;Institute on Educational Policy and Evaluation of International Students,Beijing Languageand Culture University,Beijing 100083)
出处
《语言科学》
北大核心
2024年第4期354-364,共11页
Linguistic Sciences
基金
国家自然科学基金项目(32271091)
教育部中外语言合作交流中心2022年国际中文教育研究课题青年项目(22YH69D)阶段性成果。
关键词
节点词
全句共现
词义消歧
语义聚类
无指导学习
node word
whole sentence co-occurrence
word sense disambiguation
semantic clustering
unsupervised learning