摘要
为了提高同义词自动挖掘的效率,本文提出了从词典释义中自动识别和挖掘同义词的方法,使用超链接分析算法和模式匹配算法,从不同的角度提取同义词:第一部分是把词汇之间注释与被注释的关系看成是一种链接关系,对给定的词汇进行分析,把与给定词汇具有链接关系的所有相关词汇构造一个词汇图,图中的每一个节点代表相关词,每条弧代表了词汇之间注释与被注释的关系。利用超链接分析方法并结合PageRank算法,计算词汇的PageRank值,把PageRank值看成是体现词汇之间语义相似性的衡量指标,最后为每一个词汇生成候选同义词集,并通过一定的筛选原则和方法,推荐出最佳的同义词。第二部分是利用词汇定义模式,对词汇的释义方式进行分析,归纳总结出在词典释义中同义词出现的模式,进而利用模式匹配方法识别和挖掘同义词。此外,利用模式匹配方法对Web网页和期刊论文中的同义词也进行了挖掘测试。测试结果表明,利用模式匹配和超链接分析方法来自动识别和挖掘同义词具有可行性和实用性。
The paper presents two methods to enhance the ability to mine the synonyms automatically. The first method is the PageRank algorithm based on the definitions in the dictionary, we analyze the relation links between given words and the other words, then construct the associated word graph, and finally use the PageRank algorithm to calculate the similarity degree and discover the synonyms in the associated word graph. The second method is the pattern matching algorithm based on the patterns of the definitions in the dictionary, we form some mining ndes manually, then the system mines the synonyms by pattern matching method automatically. In addition, we use the pattern matching algorithm to mine the synonyms from the web and the text of the periodical articles in economic area. The mining practice of financial dictionaries shows that the precisions of PageRank algorithm and pattern matching algorithm reach 85.6% and 90% respectively. The test result indicates that the system is feasible and practical.
出处
《情报理论与实践》
CSSCI
北大核心
2006年第4期472-475,共4页
Information Studies:Theory & Application