摘要
名物化复合词的识别是汉语复合词识别中的难点。困难之处在于汉语动词和名词共现时既可以构成动词短语也可以构成名物化复合词。传统的汉语复合词识别往往只使用语料统计特征,效果往往不怎么理想。基于最大熵模型,在基准上下文特征的基础上,采用了词汇特征与Web特征对动词和名词共现时的名物化候选进行判定,取得了较好的实验结果。其中,Precision达到了86.31%,Recall达到了70.00%。
The identification of nominalization compounds is very. difficult in Chinese compound recognition. When a verb and a noun cooccur,there will be an ambiguity as whether the expression is a verb phrase or a compound. Traditional identification of nominalization compounds is usually only based on the features from the corpus and the result is not very good. In this paper it uses a Maximum Entropy model to identify nominafization eompounds. Besides the baseline contextual features, the model also adopts lexical and Web features for the identification task. The experiment result is eneouraging. The Preeision and Recall is 86.31% and 70% respectively.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第9期283-285,共3页
Computer Applications and Software
基金
国家自然科学基金项目(60496326)
关键词
最大熵模型
名词性复合词
复合能力
主题词表
Web特征
基于信息检索的点式互信息
Maximum entropy model Nominal compounds (NC) Compound ability (CA) Thesaurus Web features Point-typetu mutual information based on information retrieval (PMI-IR)