期刊文献+

基于逆向技术的深层网络爬虫与数据分析 被引量:1

Deep Web Crawlers and Data Analysis Based on Reverse Technology
下载PDF
导出
摘要 大数据时代,各行各业对数据采集的需求日益增多,其中使用JavaScript加密技术进行数据采集的需求广泛,但也存在不少瓶颈。文章采用JavaScript逆向爬虫技术还原参数加密过程,动态构造出某购物网站商品评价的统一资源定位系统(Uniform Resource Locator,URL),实现了指定分类下多商品评价数据的动态采集,为同类加密数据的采集提供了新的思路。使用SnowNLP[基于Python的中文自然语言处理(NLP)库]对采集到的乐高评论数据进行情感分析发现,约66%的购买者对商品给出了积极评论;情感分布呈极性,高段集中在0.8~1.0,低段集中在0.0~0.2;词云分析显示出购买者群体比较注重商品的快递包装外观。以上结论可为在线商家提升经营管理水平提供参考。 In the era of big data,there is an increasing demand for data acquisition from various industries,among which the use of JavaScript encryption technology for data acquisition is widespread,but there are also many bottlenecks.The paper proposes to use JavaScript reverse crawler technology to restore the parameter encryption process and dynamically construct a Uniform Resource Locator(URL)for product evaluation on a shopping website.It realizes the dynamic acquisition of multiple product evaluation data under specified classifications,providing a new approach for the acquisition of similar encrypted data.SnowNLP[Python-based Chinese Natural Language Processing(NLP)library]is used to conduct sentiment analysis on the collected LEGO comment data,and it is found that about 66%of buyers gave positive comments on the product.The distribution of sentiment shows polarity,with high levels concentrated between 0.8 and 1.0,and low levels concentrated between 0.0 and 0.2.Word cloud analysis shows that the buyer group pays more attention to the appearance of the product's express packaging.The above conclusions can provide reference for online sellers to improve their business management.
作者 邢羽琪 杨柽 XING Yuqi;YANG Cheng(School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)
出处 《软件工程》 2023年第12期41-45,共5页 Software Engineering
关键词 深层网络爬虫 JavaScript加密 逆向技术 AJAX 数据挖掘 deep web crawler JavaScript encryption reverse technology Ajax data mining
  • 相关文献

参考文献5

二级参考文献31

  • 1唐琳,郭崇慧,陈静锋.中文分词技术研究综述[J].数据分析与知识发现,2020,4(2):1-17. 被引量:43
  • 2方美玉,郑小林,陈德人,华艺,施艳.商品评论聚焦爬虫算法设计与实现[J].吉林大学学报(工学版),2012,42(S1):377-381. 被引量:10
  • 3Blei M,Lafferty J.Text Mining:Theory and App lic -ations [M]. Chapter Topic Models, Taylor and Francis, London,2009.
  • 4Blei D M,Ng A Y,Jordan M I.Latent Dirchlet :J].Journal of Machine Learning Research, 2003,3(4/5):993-1022.
  • 5Steyvers M,Griffiths T, Probabilistic Topic Models [M ].Latent Seman- tic Analysis: A Road to Meaning, Laurence Erlbanm, 2005.
  • 6Koller D,Friedman N.Probabilistic Gra -phical Modles: Pri -nciples and Techniques [ M].MIT Press,2009.
  • 7数据堂.数据堂页[EB/OL].http://data.com/2015-3-6.
  • 8ICTCLAS.org.ICTCLAS [DB/OL].http:www.ictclas. org/ict-clas_download.aspx,2013-7 -3.
  • 9刘正春.基于Carbide.C++的Symbian OS软件开发[J].电脑与电信,2009(1):47-49. 被引量:2
  • 10张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量:165

共引文献43

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部