摘要
随着信息技术的进步,在线数据共享等应用成为研究焦点.现有材料实验数据存储结构为复杂表,难以直接转换为二维表;数据的结构、存储方式多样;难以数据分享.为解决材料领域异构数据间共享,本文提出以基于规则的本体生成方案,实现由复杂表生成本体的过程.从复杂表生成本体速度比从复杂表解析入库快五倍.为实现数据共享,本文提出利用本体实例匹配寻找相似信息.常用匹配工具对材料实验本体的实例匹配结果不佳.本文分析其原因并针对材料领域数据源当前情况,提出基于TF-IDF算法的两种改进匹配方案,改善了在缺乏领域知识和词典下的匹配结果.为整个材料数据生态环境的建设探索出一条实现路线.其与现有常用实例匹配工具相比在材料实验数据的实验结果更适合.
With the development of information technology,applications such as online data sharing have become increasingly popular.The multiform data types in material experimental data sets cause information problems,increasing the challenges of discovering relationships among sources.To solve this data sharing problem,a rulebased automatic algorithm that transforms various complex tables in the materials research field to ontology information is proposed in this paper.Furthermore,an instance-matching method based on TF-IDF algorithm and its two improving schemes are also proposed.The experimental results indicate that the existing ontology matching tools work well with the ontology results,which are generated approximately five times faster than the approach of generating databases from complex tables.But the common tools work not well in instance matching.This paper analyzes the reason and proposes an improved matching scheme based on TF-IDF algorithm to the current situation of the data source in the material field,which lacks of domain knowledges and dictionary.The method explores an implementation route for the construction of the entire material data ecological environment.The experiment result of the method is more feasible than the common tools in this situation.
作者
马致远
曹旻
MA Zhiyuan;CAO Min;School of Computer Engineering and Science;Shanghai University;
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2018年第5期565-579,共15页
Journal of Fudan University:Natural Science
基金
Project supported by the Shanghai Municipal Science and Technology Commission(15DZ2260300)