摘要
随着云计算的快速发展,信息呈现爆炸式增长。廉价的云存储和计算能力,加速了大数据的产生,也使得解决大数据的信息收集和信息检索成为必然。大数据超过50%是非结构化数据,所以它们绝大部分以文件的形式存储。大数据被分成许多块存储在块服务器中,同时也产生存储在主服务器上的相应元数据。该文就如何收集大数据的web-url及关键词,又如何检索其中的信息,作了探讨。
With the rapid development of cloud computing,information increases rapidly.Cheap cloud storage and computing accelerates the data's generation.It also makes that the solution to large data information collection and information retrieval has become inevitable.Over 50 percent of large data is non-structured,so the majority of them are stored as files.Big data is divided into many blocks stored in a block server.And at the same time it also generates the corresponding metadata stored on the master server.This article discussed on how to collect web-url and its keyword of big data and how to retrieve its information.
作者
吴雪琴
舒晓苓
WU Xue-qin, SHU Xiao-ling (Computer Department of Sichuan TOP IT Vocational Institute, Chengdu 611743, China)
出处
《电脑知识与技术》
2014年第4期2388-2390,共3页
Computer Knowledge and Technology
关键词
云计算
大数据
信息收集
检索机制
cloud computing
big data
information collection
retrieval mechanism