摘要
词汇链能够帮助计算机正确理解词汇语义和掌握文档主旨,使得它在信息检索、文本挖掘和自动翻译等领域有着广泛的应用。提出了一种知网的中文词汇链抽取算法。该算法通过知网词典来规范词汇的语义并且通过计算词汇之间的语义相似性来确定词汇在具体语境中的语义。为了提高词汇链词汇语义的精确性和词汇链抽取的速度,算法采用非贪婪策略来确定词汇语义,按照贪婪策略来构建词汇链。实验结果表明该算法是有效的。
Lexical chains can help computers to understand the word senses and to summarize the document correctly which make it be applied in lots of fields such as information retrieval, document mining and automatic translation and so on. One kind of method of building Chinese lexical chains based on HowNet is presented here in which HowNet is used to standardize word's senses and semantic similarity between words is calculated to differentiate the appropriate senses in their contexts. The innovations of this algorithm is that non-greedy policy is utilized to find word's correct senses and meanwhile greedy policy is used to building lexical chains which improving the precision and speed of building lexical chains. Experiment results show that this algorithm has good effect.
出处
《软件导刊》
2008年第10期51-53,共3页
Software Guide
基金
南京航空航天大学引进人才基金(1009-234039)
关键词
词汇链
语义相似性
知网
Lexical Chain
Semantic Similarity
HowNet