期刊文献+

面向国内直播行业的分布式弹幕爬虫研究 被引量:5

RESEARCH ON DISTRIBUTED DANMAKU CRAWLING FOR CHINESE LIVE-STREAMING SERVICES
下载PDF
导出
摘要 近年来,依托视频行业与直播行业的兴盛,弹幕服务迅速发展。然而主流弹幕服务的弹幕环境一直以来缺乏监管,主播与用户违规行为屡禁不止,对直播弹幕的学术研究稀少,亟需开发针对弹幕的采集处理方案。针对国内知名弹幕服务的技术特征,设计一种分布式直播弹幕爬虫系统方案。分析并提出相应房间连接的建立机制与弹幕采集机制:对开放API的服务直接采用轻量级客户端实现;对基于Adobe Flash且不开放API的服务,用基于Chromium浏览器的Electron模拟浏览直播间网页,并改写其PPAPI插件界面实现,旁路Flash网络流量从而实现抓取。在某知名弹幕平台上进行了验证性实验,表明该系统能够调度IP地址资源进行较大规模抓取,且性能较好,能够处理平均134条每秒、峰值超过1 000条每秒的弹幕流量。 Danmaku services have grown in popularity recently thanks to the blooming of online video and livestreaming industries. However,the environment is seriously under-regulated,with frequent rule violations from both casters and audience,resulting in negative publicity of the industry,hence a need for research on danmaku crawling and processing is realized. A distributed live-streaming danmaku crawler scheme was devised according to technical characteristics of popular Chinese danmaku services. Light-weight clients were developed for services implementing open standards. For services based on Adobe Flash and proprietary interfaces,Chromium browser-based Electron with modified PPAPI plugin interface implementation was used to simulate browsing of streaming room pages,side-channeling Flash network traffic for crawling. Experiments on a well-known danmaku service showed that the system was able to dispatch IP addresses for large-scale crawling and the performance was good with achieving an average speed of 134 per second and peak speed of more than 1 000 per second for the danmaku traffic.
作者 王雪瑞 刘渊
出处 《计算机应用与软件》 北大核心 2018年第2期134-140,共7页 Computer Applications and Software
基金 国家科技支撑计划课题(2015BAH54F00) 国家重点研发计划项目(2016YFB0800305)
关键词 直播弹幕 爬虫浏览器 模拟PPAPI旁路 Live-streaming danmaku Crawler Browser simulation PPAPI side-channeling
  • 相关文献

参考文献6

二级参考文献46

共引文献13

同被引文献46

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部