期刊文献+

一种基于LCS的微博相似页面检测方法

A Method Based on LCS for Detecting Similar Microblog Pages
下载PDF
导出
摘要 微博是基于关系的信息分享、传播以及获取的平台,是网络舆情发起的源头、信息传播的重要阵地。微博便捷的转发操作,使得大量相同或相似的微博页面在微博空间内迅速传播。对微博相似页面进行检测,对于减轻用户浏览负担和提高网络舆情分析的效率有着重要的意义。本文针对微博相似页面提出了一种基于LCS的微博相似页面检测方法:首先计算可能相似的微博页面文档子集,其次计算其LCS并提取可信部分,最终检测出微博相似页面。实验表明,这一方法能准确、高效地检测出微博数据中的相似页面。 Microblog is a relation-based platform for sharing, spreading and acquiring information, and also the source of internet public opinion and the important battlefield of information transmission. The convenient forwarding operations of microblog result in the rapid spread of plenty of identical or similar microblog pages in the microblog space. Therefore, the detection of similar microblog pages is of great importance to lighten the client’s burden of browsing and improve the analytic efficiency of internet public opinion. A method based on LCS is introduced to detect similar microblog page:First is to calculate the files’ subset of the possibly similar microblog pages, and the next is to calculate its LCS and extract the reliable parts so as to ultimately detect the similar microblog pages. Experiments show that this method can detect the similar pages from the microblog data accurately and efficiently.
作者 张宗福
出处 《集成技术》 2013年第3期5-9,共5页 Journal of Integration Technology
基金 国家自然科学基金项目(项目批准号:61272013) 广东省教育科学"十二五"规划2012年度研究项目(项目批准号:2010TJK311)
关键词 LCS 相似性检测 相似性度量 微博页面 Longest Common Subsequence near-duplicate detection similarity measurement microblog page
  • 相关文献

参考文献12

二级参考文献61

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2陈计喜,姜丽红.自动化功能测试的方法与实现[J].计算机工程,2004,30(B12):168-169. 被引量:9
  • 3徐建民,唐万生.基于查询术语同义词的扩展信念网络检索模型[J].计算机工程,2007,33(10):28-30. 被引量:4
  • 4刘晨曦,吴扬扬.一种基于块分析的网页去噪音方法[J].广西师范大学学报(自然科学版),2007,25(2):149-152. 被引量:8
  • 5Irvine V C, Samir Khuller. Design and Analysis of Algorithms Lecture Notes [R]. Dept. of Computer Science, University of Maryland, 2003.
  • 6Michael J Wise. String Similarity Via Greedy String Tiling and Running Karp - Rabin Matching [ D]. Sydney: Uni- versity of Sydney, 1993.
  • 7Michael J Wise. Neweyes: A System for Comparing Biological Sequences Using the Running Karp - Rabin Greedy String- Tiling Algorithm [C]. Cambridge, England: Third International Conference on Intelligent Systems for Mo- lecular Biology, 2006:393 -401.
  • 8YI Lan, LIU Bing, LI Xiaoli. Eliminating noisy information in web pages for data mining [C]// Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC: s.n., 2003 : 296-305.
  • 9CAI D, YU S, WEN J R, et al.Extracting content structure for web pages based on visual representation. Asia Pacific[C]//Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications. Xi'an:s.n. ,2003:406-417.
  • 10CAI D, YU S, WEN J R, et al.VIPS: a vision-based page segmentation algorithm[R].Microsoft Technical Report: MSR-TR-2003-79,2003.

共引文献73

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部