摘要
【目的】在文献层和词汇层之间加入主题层,研究一种新的词汇相似度计算方法。【方法】阐述基于形式概念分析(FCA)的主题定义和表示模型,将词汇项映射到主题层级,提出一种基于主题相似度定量刻画词汇相似度的计算方法。【结果】以信息检索领域为例,以SIGIR会议2006-2016年收录的论文数据为样本进行评测,结果表明本文方法的精确率与召回率比FastText方法有显著提高,最大提升幅度分别达到30%和21%。【局限】该方法依赖文献关键特征词抽取的质量。【结论】基于形式概念分析的词汇相似度计算方法有效利用了词汇对应的主题语义关系,能更好地反映词语之间的关联性。
[Objective] This paper tries to add a topic layer between document and word layers, aiming to calculate word similarities effectively. [Methods] First, we proposed a topic defintion and representation model based on the theory of formal concept analysis. Then, we mapped words to the topic layer. Finally, we developed an algorithm to calculate word similarities with the help of topic-to-topic relationship. [Results] We analyzed papers of SIGIR conference from 2006 to 2016 with the proposed method to calculate word similarities in the field of information retrieval. The precision and recall of the proposed method were up to 30% and 21% higher than those of the FastText method. [Limitations] The proposed method relies on the quality of extracted feature words of documents. [Conclusions] The proposed method utilizes the semantic relations among associated topics,and effectively calculate word similarities.
作者
刘萍
彭小芳
Liu Ping;Peng Xiaofang(School of Information Management,Wuhan University,Wuhan 430072,China;Institute for Digital Library,Wuhan University,Wuhan 430072,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2020年第5期66-74,共9页
Data Analysis and Knowledge Discovery
基金
国家自然科学基金项目“基于个性化知识地图的交互式信息检索系统研究—从用户认知的角度”(项目编号:71573196)的研究成果之一。
关键词
词汇相似度
形式概念分析
概念格
主题
Words Similarity
Formal Concept Analysis
Concept Lattices
Topic