摘要
遵循数据仓库的架构模式,将分布在不同地点、不同系统的异构专利数据集成到一个数据中心,方便专利数据的统一存储、访问、分析。封装所有数据源的异构专利数据细节,对用户提供统一、透明的访问接口,用户只需关注自身的访问需求,无需过多关注各底层数据源的差异结构。综合采用快照法、触发器法、日志法、时间戳法、影子表法等各类数据提取方法,扬长避短,提高专利数据集成效率。并以时间戳法为例,应用Kettle工具实现增量专利数据提取,对文章提出的理论进行验证。
Following the architectural pattern of data warehouse,heterogeneous patent data distributed in different locations and different systems are integrated into a data center,which facilitates the unified storage,access and analysis of patent data.Encapsulate the heterogeneous proprietary data details of all data sources to provide unifi ed and transparent access interface for users,uers only need to focus on their own access requirements,and do not need to pay too much attention to the different structure of the underlying data sources.Various data extraction methods such as snapshot method,trigger method,log method,timestamp method and shadow table method are adopted comprehensively to enhance strengths and circumvent weaknesses,which helps improve the effi ciency of patent data integration.Taking the time-stamp method as an example,the above theory is verifi ed by using the tool of Kettle to extract incremental patent data.
作者
郑皓
许琦
ZHENG Hao;XU Qi(Taizhou Informatization Application Technology Collaborative Innovation Center for Small and Medium-sized Enterprises,Taizhou Vocational and Technical College;Zhejiang Collaborative Innovation Center of Industrial Robot and Intelligent Manufacturing Line Integration Promotion Application,Taizhou 318000,China)
出处
《科技创新发展战略研究》
2020年第3期14-17,共4页
Strategy for Innovation and Development of Science and Technology
基金
台州市大学生科技创新项目(一类)“基于数据仓库的专利信息分析系统研究”(台教高[2018]202号)
台州职业技术学院大学生科技创新项目“基于数据仓库的专利信息分析系统研究”(2019DKC11)。