We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents.The problem of coreference resolution has been divided into two subproble...We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents.The problem of coreference resolution has been divided into two subproblems:pronoun resolution and non-pronominal resolution.A rule based-technique was used for pronoun resolution while a learning approach for nonpronominal resolution.For named entity resolution,disambiguation arises mainly due to polysemy and synonymy.The proposed approach fixes both problems with the help of WordNet and the Word Sense Disambiguation tool.The proposed approach,to our knowledge,outperforms several baseline techniques with a higher balanced F-measure,which is harmonic mean of recall and precision.The improvements in the system performance are due to the filtering of antecedents for the anaphor based on several linguistic disagreements,use of a hybrid approach,and increment in the feature vector to include more linguistic details in the learning technique.展开更多
文摘We present a novel approach for extracting noun phrases in general and named entities in particular from a digital repository of text documents.The problem of coreference resolution has been divided into two subproblems:pronoun resolution and non-pronominal resolution.A rule based-technique was used for pronoun resolution while a learning approach for nonpronominal resolution.For named entity resolution,disambiguation arises mainly due to polysemy and synonymy.The proposed approach fixes both problems with the help of WordNet and the Word Sense Disambiguation tool.The proposed approach,to our knowledge,outperforms several baseline techniques with a higher balanced F-measure,which is harmonic mean of recall and precision.The improvements in the system performance are due to the filtering of antecedents for the anaphor based on several linguistic disagreements,use of a hybrid approach,and increment in the feature vector to include more linguistic details in the learning technique.