期刊文献+

基于网页坐标系的主题信息块判定研究

Exploring the Identifying Key Information with Coordinate System in Webpage
下载PDF
导出
摘要 在网页坐标系中运用VIPS(Vision-based page segmentation)理论,对网页中信息块的重要性进行判定.该方法利用网页创建过程中的设计习惯和人类浏览信息过程中的视觉焦点判定,按九宫格划分页面区域分布并在此基础上识别主题信息,论文最后选取新闻类型网站网页,按不同页面分割比例检测了网页信息块空间层次和主题信息块提取间的关系. Applying the theory of VIPS in the coordinate system of Webpage,the way of identifying key information in Webpage is developed.The method focuses on judging the visual focus of people in designing or browsing the Webpage,for identifying key information within distributing nine-square grid.At last,with several dissection ratios,News websites are discussed that spatial level of information block and extracting key information block are linked in web pages.
作者 张力
出处 《湛江师范学院学报》 2014年第6期106-113,共8页 Journal of Zhanjiang Normal College
关键词 网页 九宫格 VIPS 关键信息识别 信息去噪 webpage nine-square grid VIPS identifying key information eliminate noise
  • 相关文献

参考文献5

二级参考文献56

  • 1MYLLYMAKI J. Effective Web data extraction with standard XML technologies [ J ]. Computer Network,2002,39 ( 5 ) :635- 644.
  • 2HORS A L, HEGARET P L, WOOD L, et al. Document object model (DOM) level 2 core specification, World Wide Web Consortium (W3C) [ EB/OL]. (2000-11-13) [2008-08-01 ]. http://www.w3. org/TR/2000/REC-DOM-Level-2 -Core.
  • 3CRESCENZI V, MECCA G, MERIALDO P. RoadRunner: automatic data extraction from data-intensive Web sites[ C ]//Proc of ACM SIGMOD International Conference on Management of Data. New York: ACM Press,2002:624.
  • 4ARASU A, GARCIA-MOLINA H. Extracting structured data from Web pages [ R ]. Palo Alto, California : Stanford University,2002.
  • 5AHONEN-MYKA H. Discovery of frequent word sequences in text, template detection via data mining and its applications[ R ]. Helsinki: University of Helsinki,2002.
  • 6BEIL F, ESTER M, XU X. Frequent term-based text clustering[ C ]// Proc of International Conference on Knowledge Discovery and Data Mining. New York : ACM Press ,2002:436-442.
  • 7MA Ling, GOHARIAN N, CHOWDHURY A,et al. Extracting unstructured data from template generated Web documents[ C]//Proc of the 12th Internatianal Conference on Information and Knowledge Management. 2003:512-515.
  • 8GUPTA S, KAISER G, NEISTADT D, et al. DOM-based content extarction of HTML documents[ C ]//Proc of the 12th Word Wide Web Conference. 2003 : 207 - 214.
  • 9ZHAI Yan-hong, LIU Bing. Structured data extraction from the Web based on partial tree alignment[ J]. IEEE Trans on Knowledge and Data Engineering,2006,18 ( 12 ) : 1614-1628.
  • 10JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review [ J ]. ACM Computing Surveys, 1999,31 ( 3 ) :264-323.

共引文献123

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部