摘要
【目的】利用分布式语义关联计算词衔接关系,解决目前词汇链构建时存在的词间关系探测深度不够等问题,提高词汇链构建质量。【方法】对词汇链构建的技术方法进行归纳,利用WordNet词典关系来计算文本中语言单元的语义关联,利用分布式记忆模型来计算语言单元之间的潜在语义关系,将这两种语义关系结合起来实现词汇链文本表示模型的构建。同时在理论研究的基础之上选择医学领域科技论文进行对比实验。【结果】在文本主题描述方面,本文方法的词汇链构建结果要优于非贪婪算法,算法耗时与非贪婪算法相当。【局限】算法耗时较长;没有完整考虑词衔接关系;只在对医学领域科技文献的主题识别中验证了该方法的有效性,还需要在更多领域进行证明。【结论】分布式语义关联可以识别潜在语义,对使用多元短语构建词汇链也有较大的帮助,能有效地增强词汇链构建效果。
[Objective] This paper uses Distributional Semantics to build high quality lexical chains. [Methods] First, we built an algorithm using WordNet Thesaurus to compute the semantic relations among language units of the texts. Second, we adopted the Distributional Memory Model to compute their latent semantic relations. Finally, we combined these relations to build the lexical chains, which were examined with papers from medical science. [Results] The proposed algorithm was better than the non-greedy methods to describe the papers' topics. [Limitations] The efficiency of the algorithm needs to be improved. It should also be examined with papers from other fields. [Conclusions] The proposed model can detect the latent semantic relation, and then improve the quality of lexical chains building with phrases.
作者
曲云鹏
王文玲
Qu Yunpeng Wang Wenling(University of Chinese Academy of Sciences, Beijing 100049, China National Science Library, Chinese Academy of Sciences, Beijing 100190, China National Library of China, Beijing 100081, China)
出处
《现代图书情报技术》
CSSCI
2016年第9期34-41,共8页
New Technology of Library and Information Service