期刊文献+

面向信息检索的汉语同义词自动识别和挖掘(英文) 被引量:5

Automatic Recognition and Mining of Chinese Synonyms for Information Retrieval
下载PDF
导出
摘要 为了提高同义词自动挖掘的效率,本文提出了从词典释义中自动识别和挖掘同义词的方法,使用超链接分析算法和模式匹配算法,从不同的角度提取同义词:第一部分是把词汇之间注释与被注释的关系看成是一种链接关系,对给定的词汇进行分析,把与给定词汇具有链接关系的所有相关词汇构造一个词汇图,图中的每一个节点代表相关词,每条弧代表了词汇之间注释与被注释的关系。利用超链接分析方法并结合PageRank算法,计算词汇的PageRank值,把PageRank值看成是体现词汇之间语义相似性的衡量指标,最后为每一个词汇生成候选同义词集,并通过一定的筛选原则和方法,推荐出最佳的同义词。第二部分是利用词汇定义模式,对词汇的释义方式进行分析,归纳总结出在词典释义中同义词出现的模式,进而利用模式匹配方法识别和挖掘同义词。此外,利用模式匹配方法对Web网页和期刊论文中的同义词也进行了挖掘测试。测试结果表明,利用模式匹配和超链接分析方法来自动识别和挖掘同义词具有可行性和实用性。 The paper presents two methods to enhance the ability to mine the synonyms automatically. The first method is the PageRank algorithm based on the definitions in the dictionary, we analyze the relation links between given words and the other words, then construct the associated word graph, and finally use the PageRank algorithm to calculate the similarity degree and discover the synonyms in the associated word graph. The second method is the pattern matching algorithm based on the patterns of the definitions in the dictionary, we form some mining ndes manually, then the system mines the synonyms by pattern matching method automatically. In addition, we use the pattern matching algorithm to mine the synonyms from the web and the text of the periodical articles in economic area. The mining practice of financial dictionaries shows that the precisions of PageRank algorithm and pattern matching algorithm reach 85.6% and 90% respectively. The test result indicates that the system is feasible and practical.
作者 陆勇 侯汉清
出处 《情报理论与实践》 CSSCI 北大核心 2006年第4期472-475,共4页 Information Studies:Theory & Application
关键词 汉语 同义词 自动识别 自动挖掘 模式匹配 PAGERANK算法 信息检索 Chinese synonyms automatic recognition automatic mining pattern matching PageRank algorithm
  • 相关文献

参考文献14

  • 1Higgins D. Which Statistics Reflect Semantics ? Rethinking Synonymy and Word Similarity. International Conference on Linguistic Evidence,Tubingen, Germany, 2004
  • 2Ristad E S, Yianilos P N. Learning String Edit Distance. In: IEEE.Transactions on Pattern Analysis and Machine Intelligence, 1998 (5)
  • 3Edmundson H P.Axiomatic Characterization of Synonymy and Antonymy.International Conference on Computational Linguistics,Grenoble,1967.
  • 4Edmundson H P.Computer-aided Research on Synonymy and Antonymy.International Conference on Computational Linguistics,Stockholm, 1969
  • 5Tumey P D. Mining the Web for Synonyms: PMI- IR versus LSA on TOEFL. The European Conference on Machine teaming (ECML2001), Freiburg, Germany, 2001
  • 6Senellart P P. Extraction of Information in Large Graphs. Automatic Search for Synonyms. Technical Report 90, 2001
  • 7Senellart P P, Blondel V D. Automatic Discovery of Similar Words.In: Survey of Text Mining. New York: Springer-Vertag, 2003
  • 8Richardson R, Smeaton A F, Murphy J. Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words. In: Proceedings of AICS Conference. Dublin: Trinity College, 1994
  • 9Blondel V D, Senellart P P. Automatic Extraction of Synonyms in a Dictionary. In: Proceedings of the SIAM Text Mining Workshop,Arlington, VA: [s. n.], 2002
  • 10陆勇,侯汉清.用于信息检索的同义词自动识别及其进展[J].南京农业大学学报(社会科学版),2004,4(3):87-93. 被引量:25

二级参考文献19

共引文献41

同被引文献49

引证文献5

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部