摘要
社交媒体信息是互联网信息的一部分,已成为web2.0时代一种不可忽视的资源。社交媒体信息记载着社会公众的日常行为,其中有价值的信息可作为社交媒体档案长期保存,因为它是社会记忆的重要组成部分。但是很多社交媒体信息在得到有效保存前就已经丢失。通过对近几年学术界互联网档案研究热点的归纳分析,总结得出目前学界有关以内容为研究对象的社交媒体档案收集研究薄弱的结论。在借鉴传统档案收集方法的基础上,结合社交媒体的特点,梳理出社交媒体档案的收集范围以及方法,最后基于互联网爬虫技术提出基于内容的社交媒体档案的收集流程。
Social media information is a part of Internet information and has become a resource that cannot be ignored in the web 2.0 era. Social media information records the daily behavior of the public, and some valuable information can be stored as long-term social media archives, because it is an important part of social memory. However, a lot of social media information has been lost before it is effectively saved. Based on the inductive analysis of the research hot spots on Internet archives in recent years by the academic circle, this paper concludes that the current research on the collection of social media archives with content as the research object is weak. Based on the traditional methods of archives collection, in combination with the characteristics of social media, the collection scope and methods of social media archives are sorted out. Finally, the collection process of content-based social media archives is proposed based on Internet crawler technology.
作者
尚珊
施亚玲
Shang Shan;Shi Yaling
出处
《兰台世界》
2019年第2期24-29,共6页
Lantai World
关键词
社交媒体档案
内容
收集
流程
social media archives
content
collection
process