摘要
日地空间系统科学的数据具有体量大、种类多、结构复杂的特征,不同概念、不同事件之间的相互关联为该领域内的科学数据检索提出了很高的要求.然而目前该领域内依然以基于传统的关键词检索技术为主,严重影响检索结果的质量.提出一种数据语义检索模型,它是在对日地空间学科元信息提取的基础上,使用文本处理的方法将提取信息转换为词项-文档矩阵,进一步使用潜在语义索引技术对其进行分析,计算出检索条目与不同数据集的语义相关度,从而根据语义相关度向用户推荐科学数据.实验对比表明,该模型的召回率明显优于传统方法,且具有很高的准确率.该模型同时支持对科学数据进行语义标注和关键词提取,亦可用于其他领域科学数据检索.
The scientific data of solar-terrestrial space science has huge volume, wide variety, and complex structure. The correlations between different domain concepts and astro-events put forward high requirements of the scientific data retrieval in this field. However, the scientific data retrieval modules on the mainstream data share and publishing systems in this field are still built on the conventional keyword-based retrieval method. We present a semantic retrieval approach for the solar- terrestrial space system scientific data. Based on the semantic information extracted from scientific metadata of each scientific dataset, we get the TF-idf matrix using traditional text processing methods. Then latent semantic indexing further analyzes this matrix, and a similarity value is obtained to rank the relevance of a result to its search the approach has a higher recall rate than conventional approach can be applied in other disciplines as well. request. The experimental results show that methods and maintains a high precision. This
出处
《中国科学院大学学报(中英文)》
CSCD
北大核心
2016年第5期711-719,共9页
Journal of University of Chinese Academy of Sciences
基金
中国科学院信息化建设专项(XXH12504-08)
中国科学院战略性先导科技专项(XDA04080000)资助