期刊文献+

同义词抽取结果的噪音清洗方法研究

A Noise Cleaning Method for Synonym Extraction Results
原文传递
导出
摘要 【目的】同义词抽取结果中的噪音会严重影响结果的可应用性,需要预先进行清洗。【方法】提出一种基于同义关系网络的噪音清洗方法,将同义词抽取结果转化为无向结构的同义关系网络,在该网络中自动识别出同义词抽取结果中部分噪音,并结合语义的分布相似性对方法进行改进,以提高噪音的识别比例。【结果】通过在工程技术领域随机选取的术语上进行实验,表明该方法可以过滤同义词抽取结果中32.6%–73.0%的噪音。【局限】只能清除部分噪音,还需要改进方法以提高噪音识别的准确性。【结论】通过构建同义关系网络能够清洗同义词抽取结果中的噪音,该问题值得进一步深入研究。 [Objective] There are lots of noises in synonym extraction results, and the noises would hurt the availability of extraction results. [Methods] This paper proposes a noise cleaning solution based on synonym graph. The proposed method firstly transforms synonym extraction results into an undirected synonym graph, and then detects the noises in the graph. The method is improved by incorporating the distribution similarity. [Results] The terms randomly selected from the technique field are used in the experiments, and the experiments show that this method can remove noises from the synonym extraction results to some extend. [Limitations] Only part of noises is cleaned, hence the accuracy of detecting noises needs be increased by improving the methods. [Conclusions] This is a feasible approach to clean the noises in the synonym extraction results, which is worth further study.
出处 《现代图书情报技术》 CSSCI 2015年第6期64-70,共7页 New Technology of Library and Information Service
基金 国家"十二五"科技支撑计划资助项目"<汉语主题词表>(工程技术版)与英文超级科技词表的映射研究"(项目编号:2011BAH10B07)的研究成果之一
关键词 同义词 信息抽取 噪音清洗 同义关系网络 Synonym Information extraction Noise cleaning Synonym relation graph
  • 相关文献

参考文献12

  • 1同义关系抽取结果评测[EB/OL]l[2014-12-29].http://tcci.ccf.org.cn/conference/2012/dldoc/2012语义关系评测结果.pdf.
  • 2Pantel P, Lin D. Discovering Word Senses from Text [C]. In: Proceedings of the 8th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). New York: ACM, 2002: 613-619.
  • 3Cheng T, Lauw H W, Paparizos S. Entity Synonyms for Structured Web Search [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24( 10): 1862-1875.
  • 4Berry M W, Castellanos M. Survey of Text Mining I1 [M]. London: Springer, 2008: 25-44.
  • 5Bohn C, Norvg K. Extracting Named Entities and Synonyms from Wikipedia [C]. In: Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications (AINA'10). IEEE Computer Society, 2010: 1300-1307.
  • 6陆勇,侯汉清.基于模式匹配的汉语同义词自动识别[J].情报学报,2006,25(6):720-724. 被引量:20
  • 7于娟,尹积栋,费庶.基于句法结构分析的同义词识别方法研究[J].现代图书情报技术,2013(9):35-40. 被引量:13
  • 8Hagiwara M, Ogawa Y, Toyama K. Supervised Synonym Acquisition Using Distributional Features and Syntactic Patterns [J]. Information and Media Technologies, 2009, 4(2):558-582.
  • 9Kaji N, Kitsuregawa M. Using Hidden Markov Random Fields to Combine Distributional and Pattern-based Word Clustering [C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK. Stroudsburg: Association for Computational Linguistics Press, 2008:401-408.
  • 10陆勇,章成志,侯汉清.基于百科资源的多策略中文同义词自动抽取研究[J].中国图书馆学报,2010,36(1):56-62. 被引量:26

二级参考文献62

共引文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部