摘要
【目的】同义词抽取结果中的噪音会严重影响结果的可应用性,需要预先进行清洗。【方法】提出一种基于同义关系网络的噪音清洗方法,将同义词抽取结果转化为无向结构的同义关系网络,在该网络中自动识别出同义词抽取结果中部分噪音,并结合语义的分布相似性对方法进行改进,以提高噪音的识别比例。【结果】通过在工程技术领域随机选取的术语上进行实验,表明该方法可以过滤同义词抽取结果中32.6%–73.0%的噪音。【局限】只能清除部分噪音,还需要改进方法以提高噪音识别的准确性。【结论】通过构建同义关系网络能够清洗同义词抽取结果中的噪音,该问题值得进一步深入研究。
[Objective] There are lots of noises in synonym extraction results, and the noises would hurt the availability of extraction results. [Methods] This paper proposes a noise cleaning solution based on synonym graph. The proposed method firstly transforms synonym extraction results into an undirected synonym graph, and then detects the noises in the graph. The method is improved by incorporating the distribution similarity. [Results] The terms randomly selected from the technique field are used in the experiments, and the experiments show that this method can remove noises from the synonym extraction results to some extend. [Limitations] Only part of noises is cleaned, hence the accuracy of detecting noises needs be increased by improving the methods. [Conclusions] This is a feasible approach to clean the noises in the synonym extraction results, which is worth further study.
出处
《现代图书情报技术》
CSSCI
2015年第6期64-70,共7页
New Technology of Library and Information Service
基金
国家"十二五"科技支撑计划资助项目"<汉语主题词表>(工程技术版)与英文超级科技词表的映射研究"(项目编号:2011BAH10B07)的研究成果之一
关键词
同义词
信息抽取
噪音清洗
同义关系网络
Synonym Information extraction Noise cleaning Synonym relation graph