摘要
语义关联度计算是数据科学中的一个关键性基础问题,在信息检索及自然语言处理等方面有着广泛的应用.针对ESA(Explicit Semantic Analysis)算法存在的局限性,提出一种显式语义特征选择算法,并构建低维语义空间.在此基础上,根据特征概念在Wikipedia中的映射信息,提出一种低维显式语义空间下的语义关联度计算方法.该方法解决了ESA算法在后续语义关联度计算过程中,因高维稀疏空间导致计算效果不够准确的问题.实验结果表明,与当前其他方法相比,该方法的计算结果在皮尔逊相关系数(P)及斯皮尔曼相关系数(S)上与人们的认知判断之间具有更好的一致性.
Semantic relatedness computation is a critical fundamental issue in data science.It has a wide range of applications in information retrieval and natural language processing.In view of the current limitations of ESA(Explicit Semantic Analysis)algorithm,a feature selection algorithm is presented to filter the explicit semantic features,and the low dimensional semantic space is constructed.On this basis,according to the mapping information of feature concepts in Wikipedia,a semantic relatedness computation method is proposed under low dimensional explicit semantic space.This method can improve the efficiency of ESA in the following relatedness computing process under high dimensional sparse space.Finally,the experimental results demonstrate that the proposed method has a better correlation on Pearson’s(P)and Spearman’s(S)correlation coefficient with the intuitions of human judgments than other related works.
作者
李璞
蒋锦涛
张志锋
申红雪
梁辉
唐慧
LI Pu;JIANG Jintao;ZHANG Zhifeng;SHEN Hongxue;LIANG Hui;TANG Hui(Software Engineering College,Zhengzhou University of Light Industry,Zhengzhou 450000,China)
出处
《信阳师范学院学报(自然科学版)》
CAS
北大核心
2019年第4期675-682,共8页
Journal of Xinyang Normal University(Natural Science Edition)
基金
国家自然科学基金项目(61802352,61872439)
郑州轻工业大学博士科研基金项目(0215/13501050015)
郑州轻工业大学校级青年骨干教师培养对象资助计划(2018XGGJS006)
关键词
显式语义空间
低维语义空间
特征选择
语义关联度
explicit semantic space
low dimensional semantic space
feature selection
semantic relatedness