摘要
借鉴现代汉语词义消歧的研究成果,提出一种改进的向量空间模型词义消歧方法,即在古汉语义项词语知识库的支持下,将待消歧多义词上下文与多义词的义项映射到向量空间模型中,完成语义消歧任务。以中国农业古籍全文数据库为统计语料,对10个典型古汉语多义词,共29个义项、1 836条待消歧上下文进行义项标注的实验,消歧平均正确率达到79.5%。
How to annotate the meaning of words is an important research work on collation of Chinese ancient books. The manual interpretation is time -consuming and laborious. According to the word sense disambiguation of modern Chinese, an improved unsupervised disambiguation method of ancient Chinese is proposed based on the vector space model. In order to disambiguate the word sense, the knowledge repository of ancient Chinese polysemous words is build, and the contexts and the meanings of the polysemous words are mapped into the vector space model. This paper takes the full - text database of Chinese agricultural ancient books for statistics corpus, and conducts the experiment using 10 typical polysemous words of ancient Chinese which include 29 senses and 1836 contexts. The result shows that the average disambiguation accuracy achieves 79.5%.
出处
《图书情报工作》
CSSCI
北大核心
2013年第2期114-118,共5页
Library and Information Service
基金
国家社会科学基金项目"古籍整理与开发智能化技术研究"(项目编号:08ATQ002)
高等学校博士学科点专项科研基金资助课题"古农书资料自动编纂及注释系统的设计与构建"(项目编号:20090097110033)研究成果之一
关键词
向量空间模型
词义消歧
古汉语
vector space model semantic disambiguation ancient Chinese