摘要
随着大数据时代的日益发展,数据的获取与分析成为热点,基于Python的爬虫技术是目前数据分析工作中使用的最为广泛工具之一。本文应用Python爬虫关键技术对猫眼电影网的影片榜单及热映电影进行数据获取,并基于Python的开发环境Spyder进行数据分析,使用Numpy系统存储和处理大型数据,中文Jieba分词工具进行爬取数据的分词文本处理,Snownlp库处理文本的情感,最终通过词云图、网页动态图展示观众情感倾向和影片评分统计等信息,为用户观影提供决策支持。
With the development of the era of big data,data acquisition and analysis have become a hot topic.Python-based crawler technology is one of the most widely used tools in data analysis.In this paper,the key technology of Python crawler is applied to obtain the movie list and hot movies of Cat's Eye Film Network.The data is analyzed based on Spyder,the development environment of Python.Large data is stored and processed by Numpy system.Chinese Jieba word segmentation tool is used to process word segmentation text of crawling data.Snownlp library is used to process text.Affection,finally through the word cloud map,web dynamic map to show the audience's emotional inclination and film score statistics and other information,to provide decision-making support for users to watch movies.
作者
徐勤亚
蔡继鹏
王星
XU Qin-ya;CAI Ji-peng;WANG Xing
出处
《信息技术与信息化》
2019年第8期113-115,共3页
Information Technology and Informatization