摘要
现存的Web分析技术大多基于数据的文本内容,而忽视了数据本身的结构信息。为此,介绍CWI——一种新的海量数据分析和查询工具。作为CWI的一部分,TLGM和TLGM-Ql实现了对于Web数据内容和结构的查询分析,并且在分布式环境下实现了TLGM的图数据存储,实现了TLGM-QL的4个基本算子,实验证明该结构具有良好的平衡性和可扩展性。
Most previous work focuses on analyzing the content of the Web data,while ignores the structural information in the data. This paper introduces a massive data analysis and query tool,CWI. As a part of CWI,TLGM and TLGM-QL implement the analysis of both the content and structure of the Web data. TLGM implemented in a distributed enviorment that adopt the graph to model the Web ,where both the content and structure of the Web data are stored. Based on this storage model,four basic operators and the re-construction of graph are implemented. The experiments domenstrate that our storage framework has good balance and scalability.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2009年第1期125-128,共4页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金重点资助项目(60833003)
关键词
分布式存储
图数据
索引
负载均衡
distributed storage
graph data
index load balance