Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and language...Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and languageindependent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.展开更多
基金supported mainly by Canada's IDRC Research Chair in Information Technology Program,under Grant No.104519006supported by the National Natural Science Foundation of China under Grant No.60973104+2 种基金the National Basic Research 973 Program of China under Grant No.2007CB311003NSERC Grant OGP0046506Canada Research Chair Program,MITACS,an NSERC Collaborative Grant,and Ontario's Premier's Discovery Award
文摘Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and languageindependent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.