基于URL定位信息的BBS数据挖掘方法研究被引量：2

Study on Algorithm of BBS Data Mining Based on URL Location Information

下载PDF

导出

摘要利用Web页面的采集序位和被检索页面的相关信息和主题,使得以主题为分块的网络爬虫算法,能够尽可能多地把整个Web按照主题为依据进行分块整合,可以采用对URL定位信息,提高了页面的高效检索能力。仿真实验中表明,提出的主题相关爬虫算法能够跨越BBS中URL网页中的断裂带,提高了URL网页的召回率,也不至于因为网页的断裂而中止检索。算法精度分析表明,误判点都在等分线附近徘徊,偏差不大,表明算法精度较高。 The collection sequences of Web pages and the relative information and focuses were taken in use, and made thenetwork crawler algorithm divide and integrate the Web pages based on the focuses, the URL location information was usedand the performance of efficient retrieval for the pages was improved. Simulation and experiments were taken based on thereal BBS, and result shows that the focused relative crawler algorithm which proposed here can overcome the fracture zoneof the URL pages in the BBS, and the recall rate of URL information is improved and the retrieval cannot be discontinuedfor the fracture. The precision analysis result of the algorithm shows that the erroneous judge points are distributed aroundthe accurate judge line, the result is good.

作者赵哲马晓珺

机构地区安阳师范学院计算机与信息工程学院安阳师范学院公共计算机教学部

出处《科技通报》北大核心 2014年第4期206-208,共3页 Bulletin of Science and Technology

关键词网络爬虫算法 URL定位信息 BBS信息检索数据挖掘 network crawler algorithm URL location information BBS information retrieval data mining

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1杨文忠,章兢.一种基于常用搜索引擎的智能信息检索系统[J].微计算机应用,2007,28(2):166-169. 被引量：4
2刘佐达,张久岭,陈茂科,李星.一种面向BBS信息检索的主题网络爬虫算法[J].郑州大学学报（理学版）,2010,42(2):22-25. 被引量：13

二级参考文献9

1Arasu A, Cho J H, Molina H G, et al. Searching the Web[J].ACM Transactions on Internet Technology,2001,8(1): 2- 43.
2上海图书馆,《中文搜索引擎的现状与应用》课题组.．中文搜索引擎比较研究．.http:／／www．istis．sh．cn／istis／dlib／report／search1．html，,1998，(8)..
3GudivaduVN. Information retrieval on the World Wide Web. IEEE Internet Computing, 1997,1 (5) :58 -68
4U. Manber. Finding similar files in a large file system[ R]. Arizona: University of Arizona, 1993, (10)
5[27]Podilchuk C I, Del PE J. Digital Watermarking: Algorithms and Applications. IEEE Signal Processing Magazine,2001,69(13) :33 -46
6刘金红,陆余良.主题网络爬虫研究综述[J].计算机应用研究,2007,24(10):26-29. 被引量：131
7李晓亚,赫枫龄,左万利.基于网页分块技术主题爬行器的实现[J].吉林大学学报（理学版）,2007,45(6):959-965. 被引量：4
8庄育飞,吴传炉.谈谈搜索引擎的语法规则[J].图书情报知识,1999,16(2):48-49. 被引量：6
9徐振航,刘莉芹.基于XML的WEB数据挖掘技术[J].计算机系统应用,2001,10(1):39-42. 被引量：26

共引文献15

1韩娜,沈西挺,刘岩.基于用户兴趣的个性化搜索系统研究[J].软件导刊,2010,9(1):38-39. 被引量：2
2邬群勇,王钦敏,王焕炜.一种Web地图服务搜索器的设计[J].微计算机应用,2009,30(2):35-39. 被引量：9
3李静,程文娟,杨超宇.机器翻译对网络信息安全的影响研究[J].中国科技论坛,2013(12):129-134.
4耿向华,潘宁.引入或然状态优化控制的网络文本特征挖掘[J].科技通报,2014,30(6):61-63. 被引量：1
5康凤,蒋小惠,冯梅.网络伪装隐形文本特征检测及数据挖掘方法[J].科技通报,2014,30(4):113-115. 被引量：3
6罗蓉.基于异层迭代算法的网络隐晦词汇深挖技术[J].科技通报,2014,30(4):137-139.
7张帆,王艳杰.加入用户行为因素的BBS热点分析算法仿真[J].科技通报,2014,30(8):146-148.
8马国富.网络论坛类媒体舆情热点主动发现的方法[J].重庆科技学院学报（社会科学版）,2015(3):7-8.
9荆文鹏,王育坚,董伟伟.自适应遗传算法在主题爬虫搜索策略中的应用研究[J].计算机科学,2016,43(8):254-257. 被引量：6
10郎振红.基于主题网络爬虫思想的Web数据挖掘算法研究[J].工业和信息化教育,2016(10):79-84. 被引量：3

同被引文献19

1吕杰林,陈是维.基于相关性度量的关联规则挖掘[J].浙江大学学报（理学版）,2012,39(3):284-288. 被引量：15
2张振翎,李建华.网络欺诈及防范体系研究[J].信息安全与通信保密,2007,29(1):109-110. 被引量：4
3Rubin S,Christodorescu M,Ganapathy V,et al.An auctioning reputation system based on anomaly[C]//Proceedings of the 12th ACM Conference on Computer and Communications Security.ACM,2005:270-279.
4Hendrikx F,Bubendorfer K,Chard R.Reputation systems:A survey and taxonomy[J].Journal of Parallel and Distributed Computing,2015,75:184-197.
5Mohanty B K,Passi K.Agent based e-commerce systems that react to buyers,feedbacks-A fuzzy approach[J].International Journal of approximate reasoning,2010,51(8):948-963.
6Allen J.Case study:implementing MT for the translation of pre-sales marketing and post-sales software deployment documentation at Mycom International[M],Machine Translation:From Real Users to Research.Springer Berlin Heidelberg,2004:1-6.
7Sun S,Wang 丁,Chen L,et al.Understanding Consumers,trust in Internet Financial Sales Platform:Evidence from a YueBaoM[C]// Proceeding of Pacific Asia Conference on Information System(PACIS).2014.
8Li W,Chen H.Identifying Top Sellers In Underground Economy Using Deep Learning-Based Sentiment Analysis[C]//2014 IEEE Joint Intelligence and Security Informatics Conference(JISIC).IEEE,2014:64-67.
9Liberatore M,Levine B N,Shields C.Strengthening forensic investigations of child pornography on p2p networks[C].Proceedings of the 6th International Conference.ACM,2010:19.
10Haggerty J,Llewellyn-Jones D,Taylor M.Forweb:file fingerprinting for automated network forensics investigations Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications,Information,and Multimedia and Workshop.ICST(Institute for Com-puter Sciences,Social-Informatics and Telecommunications Engineering),2008:29.

引证文献2

1李华康,赖龙彬,陈光宣,杨一涛,孙国梓.一种基于URL语法规则的欺诈网站识别方法[J].计算机科学,2015,42(B10):28-33. 被引量：1
2郎振红.基于主题网络爬虫思想的Web数据挖掘算法研究[J].工业和信息化教育,2016(10):79-84. 被引量：3

二级引证文献4

1黄宇龙.网络爬虫的有关算法[J].中国新通信,2017,19(23):47-48. 被引量：2
2杨凌云.主题网络爬虫关键技术的应用探讨[J].电脑编程技巧与维护,2018(11):33-35. 被引量：4
3胡萍瑞,李石君.基于URL模式集的主题爬虫[J].计算机应用研究,2018,35(3):694-699. 被引量：18
4赵子钦.基于Python爬虫技术的QQ空间说说的爬取[J].科学技术创新,2020(31):54-56. 被引量：1

1陈锋.基于word文档数据挖掘与互联网检索技术及实现[J].科技创新导报,2008,5(8):25-26. 被引量：1
2刘荣辉,郑建国.Deep Web下基于中文分词的聚类算法[J].计算机工程与应用,2011,47(4):138-140.
3本刊检索方法[J].计测技术,2010,30(2):18-18.
4动态与信息[J].计测技术,2012,32(5):4-4.
5《计测技术》检索方法[J].计测技术,2011,31(6):39-39.
6《计测技术》检索方法[J].计测技术,2011,31(1):65-65.
7郭少友.利用XML技术实现互联网信息的广播式检索[J].电脑编程技巧与维护,2002(6):62-63.

科技通报

2014年第4期

浏览历史

内容加载中请稍等...

基于URL定位信息的BBS数据挖掘方法研究被引量：2

参考文献2

二级参考文献9

共引文献15

同被引文献19

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于URL定位信息的BBS数据挖掘方法研究 被引量：2

参考文献2

二级参考文献9

共引文献15

同被引文献19

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于URL定位信息的BBS数据挖掘方法研究被引量：2