期刊文献+

基于URL定位信息的BBS数据挖掘方法研究 被引量:2

Study on Algorithm of BBS Data Mining Based on URL Location Information
下载PDF
导出
摘要 利用Web页面的采集序位和被检索页面的相关信息和主题,使得以主题为分块的网络爬虫算法,能够尽可能多地把整个Web按照主题为依据进行分块整合,可以采用对URL定位信息,提高了页面的高效检索能力。仿真实验中表明,提出的主题相关爬虫算法能够跨越BBS中URL网页中的断裂带,提高了URL网页的召回率,也不至于因为网页的断裂而中止检索。算法精度分析表明,误判点都在等分线附近徘徊,偏差不大,表明算法精度较高。 The collection sequences of Web pages and the relative information and focuses were taken in use, and made thenetwork crawler algorithm divide and integrate the Web pages based on the focuses, the URL location information was usedand the performance of efficient retrieval for the pages was improved. Simulation and experiments were taken based on thereal BBS, and result shows that the focused relative crawler algorithm which proposed here can overcome the fracture zoneof the URL pages in the BBS, and the recall rate of URL information is improved and the retrieval cannot be discontinuedfor the fracture. The precision analysis result of the algorithm shows that the erroneous judge points are distributed aroundthe accurate judge line, the result is good.
作者 赵哲 马晓珺
出处 《科技通报》 北大核心 2014年第4期206-208,共3页 Bulletin of Science and Technology
关键词 网络爬虫算法 URL定位信息 BBS信息检索 数据挖掘 network crawler algorithm URL location information BBS information retrieval data mining
  • 相关文献

参考文献2

二级参考文献9

  • 1Arasu A, Cho J H, Molina H G, et al. Searching the Web[J].ACM Transactions on Internet Technology,2001,8(1): 2- 43.
  • 2上海图书馆,《中文搜索引擎的现状与应用》课题组..中文搜索引擎比较研究..http://www.istis.sh.cn/istis/dlib/report/search1.html,,1998,(8)..
  • 3GudivaduVN. Information retrieval on the World Wide Web. IEEE Internet Computing, 1997,1 (5) :58 -68
  • 4U. Manber. Finding similar files in a large file system[ R]. Arizona: University of Arizona, 1993, (10)
  • 5[27]Podilchuk C I, Del PE J. Digital Watermarking: Algorithms and Applications. IEEE Signal Processing Magazine,2001,69(13) :33 -46
  • 6刘金红,陆余良.主题网络爬虫研究综述[J].计算机应用研究,2007,24(10):26-29. 被引量:131
  • 7李晓亚,赫枫龄,左万利.基于网页分块技术主题爬行器的实现[J].吉林大学学报(理学版),2007,45(6):959-965. 被引量:4
  • 8庄育飞,吴传炉.谈谈搜索引擎的语法规则[J].图书情报知识,1999,16(2):48-49. 被引量:6
  • 9徐振航,刘莉芹.基于XML的WEB数据挖掘技术[J].计算机系统应用,2001,10(1):39-42. 被引量:26

共引文献15

同被引文献19

  • 1吕杰林,陈是维.基于相关性度量的关联规则挖掘[J].浙江大学学报(理学版),2012,39(3):284-288. 被引量:15
  • 2张振翎,李建华.网络欺诈及防范体系研究[J].信息安全与通信保密,2007,29(1):109-110. 被引量:4
  • 3Rubin S,Christodorescu M,Ganapathy V,et al.An auctioning reputation system based on anomaly[C]//Proceedings of the 12th ACM Conference on Computer and Communications Security.ACM,2005:270-279.
  • 4Hendrikx F,Bubendorfer K,Chard R.Reputation systems:A survey and taxonomy[J].Journal of Parallel and Distributed Computing,2015,75:184-197.
  • 5Mohanty B K,Passi K.Agent based e-commerce systems that react to buyers,feedbacks-A fuzzy approach[J].International Journal of approximate reasoning,2010,51(8):948-963.
  • 6Allen J.Case study:implementing MT for the translation of pre-sales marketing and post-sales software deployment documentation at Mycom International[M],Machine Translation:From Real Users to Research.Springer Berlin Heidelberg,2004:1-6.
  • 7Sun S,Wang 丁,Chen L,et al.Understanding Consumers,trust in Internet Financial Sales Platform:Evidence from a YueBaoM[C]// Proceeding of Pacific Asia Conference on Information System(PACIS).2014.
  • 8Li W,Chen H.Identifying Top Sellers In Underground Economy Using Deep Learning-Based Sentiment Analysis[C]//2014 IEEE Joint Intelligence and Security Informatics Conference(JISIC).IEEE,2014:64-67.
  • 9Liberatore M,Levine B N,Shields C.Strengthening forensic investigations of child pornography on p2p networks[C].Proceedings of the 6th International Conference.ACM,2010:19.
  • 10Haggerty J,Llewellyn-Jones D,Taylor M.Forweb:file fingerprinting for automated network forensics investigations Proceedings of the 1st International Conference on Forensic Applications and Techniques in Telecommunications,Information,and Multimedia and Workshop.ICST(Institute for Com-puter Sciences,Social-Informatics and Telecommunications Engineering),2008:29.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部