期刊文献+

网页分块聚类的Web站点逻辑域挖掘 被引量:1

Web Site Logical Domain Mining Based on Web Page Block Cluster
下载PDF
导出
摘要 Web逻辑域挖掘是当前Web挖掘领域的研究热点之一,它强调从网站设计者的角度来挖掘站点中有逻辑联系的网页,以形成一个逻辑域,而不是单纯的文本聚类或超链排序。随着应用的不同,站点逻辑域的界定也有所不同。在综合分析了几种具有代表性的站点逻辑域及其挖掘方法后,提出了基于网页分块聚类的Web站点逻辑域挖掘模型和挖掘算法。实验结果表明,该算法具有很好的稳定性和适应性,其精度不受站点规模、语言、镜像等因素的影响,召回率则会随着取回网页数目的增加而增加。 Web logical domain mining is a pioneer brunch in the filed of Web mining. It emphasizes to find those Web pages, which in the view of Web site master, have intra logic relationship and is not purely text cluster or hypedink ranking. The definitions of Web site logical domain differ from different applications. After summarizing several kinds of Web logical domain models and the mining algorithm, this paper proposes a model and an algorithm. The experimental results show that the algorithm is stable and adjustable. Its precision is hardly effected by the scale of Web site, language and mirror sites. And its recall will improve as the quantity of Web pages obtained increases.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第4期52-54,57,共4页 Computer Engineering
关键词 网页分块 Web逻辑域 WEB挖掘 分块粒度 Web page block Web logical domain Web mining Block granularity
  • 相关文献

参考文献6

  • 1Li Wensyan,Kolak O,Wu Quoc,et al.Defining Logical Domains in a Web Site[C]//Proceedings of the 11^th ACM Conference on Hypertext.San Antonio.2000-05:123-132.
  • 2Eiron N,McCurley K S.Untangling Compound Documents on the Web[C]//Proceedings of the the 14^th ACM Conference on Hypertext and Hypermedia.2003:85-94.
  • 3Bar-Yossef Z,Rajagopalan S.Template Detection via Data Mining and Its Application[C]//Proceedings of the 11^th International Conference on World Wide Web.ACM Press,2002:580-591.
  • 4Lin Shianhua,Ho Janming.Discovering Informative Content Blocks from Web Documents[C]//Proceedings of ACM SIGKDD'02.2002.
  • 5Liu Z,Ng W K,Lim E P.An Automated Algorithm for Extracting Website Skeleton[C]//Proceedings of the 9^th International Conference on Database Systems for Advanced Applications.2004-03-17.
  • 6Deng Cai,Yu Shipeng.Extracting Content Structure for Web Pages Based on Visual Representation[C]//Proc.of the 5^th Asia Pacific Web Conference.2003.

同被引文献1

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部