摘要
词义消歧的特征来源于上下文.日文兼有中英文的语言特性,特征抽取更为复杂.针对日文特点,在词义消歧逻辑模型基础上,利用最大熵模型优良的信息融合性能,采用深层特征抽取方法,引入语义、句法类特征用于消解歧义.同时,为避免偏斜指派,采用BeamSearch算法进行词义序列标注.实验结果表明,与仅使用表层词法类特征方法相比,本文构造的日文词义消歧系统的消歧精度提高2%~3%,动词消歧精度获得5%的改善.
The features of word sense disambiguation (WSD) come from the context. Japanese has linguistic features of both Chinese and English at the same time, thus the feature extraction of Japanese is more complicated. Considering Japanese features, based on the proposed WSD logic model and applying the characteristics of information integration of the maximum entropy model, WSD was solved by the deep feature extraction method, introducing semantics and syntactics features. Meanwhile, for preventing the skewed assignment of lonely word sense, the word sense tagging of word sequences was completed with the BeamSearch algorithm. Experiment results show that compared with WSD methods which only focus on the surface lexical features, the disambiguation accuracy of the Japanese WSD system proposed in this paper increases 2% to 3% , and the WSD accuracy of verbs improves 5%.
出处
《北京科技大学学报》
EI
CAS
CSCD
北大核心
2010年第2期263-269,共7页
Journal of University of Science and Technology Beijing
基金
国家高技术研究发展计划资助项目(No.2007AA01Z170)
关键词
自然语言处理
词义消歧
最大熵模型
特征抽取
natural language processing
word sense disambiguation
maximum entropy model
feature extraction