期刊文献+

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation
下载PDF
导出
摘要 Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus. Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus.
作者 Noushin Riahi Fatemeh Sedghi Noushin Riahi;Fatemeh Sedghi(Computer Engineering Department, Alzahra University, Tehran, Iran)
出处 《Journal of Computer and Communications》 2016年第4期109-124,共16页 电脑和通信(英文)
关键词 Collocation Extraction Word Sense Disambiguation Untagged Corpus Decision List Collocation Extraction Word Sense Disambiguation Untagged Corpus Decision List
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部