期刊文献+

基于语义分层迭代法的网页挖掘技术 被引量:2

Web page mining technique based on semantic segmentation iterativ method
下载PDF
导出
摘要 提出了一种基于页面语义的分层迭代划分方法,并将其运用于网页挖掘,通过把网站页面迭代划分为不同数目节点的多层,选取符合要求的层来进行数据挖掘处理,便于快速定位到该层中的某个节点,该节点就是需要的主要内容。 This paper points out a segmentation iterative method based on web semantics and applies this method to web mining. By classifying web iteration into different numbers of hierarchy and by choosing the segmented hierarchy which accords with the requirement to be treated by data mining, some nodes of this hierarchy are rapidly positioned and the contents of this nodes are the main contents required.
出处 《重庆工商大学学报(自然科学版)》 2007年第5期477-480,498,共5页 Journal of Chongqing Technology and Business University:Natural Science Edition
关键词 网页挖掘 网页分层迭代 页面区域 web mining web segmented iterative anlysis web region
  • 相关文献

参考文献7

  • 1CHEN Y. Detecting web page structure for adaptive viewing on small form factor devices [ C ]. In Proceedings of the 11 th World Wide Web Conference (WWW 12), 2003. 55 -61
  • 2李明,张为群.基于标记树的WEB页面净化技术研究[J].西南师范大学学报(自然科学版),2006,31(5):128-131. 被引量:3
  • 3常育红,姜哲,朱小燕.基于标记树表示方法的页面结构分析[J].计算机工程与应用,2004,40(16):129-132. 被引量:24
  • 4张文斌,陈恩红,王进.一种基于多叉树的HTML到XML的转换方法[J].小型微型计算机系统,2003,24(4):713-715. 被引量:4
  • 5AI D, YU S, WEN J. Block -based Web Search[ C]. in the 27th Annual International ACM SIGIR Conference (SIGIR' 2004), 2004.42 - 47
  • 6CAI D, HE X, WEN J. Block - level Lillk Analysis[ C]. in the 27th Annual International ACM SIGIR Conference ( SIGIR'2004) ,2004,48 - 52
  • 7CAI D, HE X, LI Z. Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis. in 12th ACM International Conference on Multimedia, New York City, USA,2004.52- 56

二级参考文献13

  • 1[1]Extensible Markup Language (XML) 1.0 (Second Edition). W3C Recommendation 6 October 2000[EB/OL]. http://www.w3.org/TR/REC-xml.
  • 2[2]Raggett D, Le Hors A and Jacobs I. Hypertext markup language 4.0 Reference Specification[EB/OL]. December 1997. http://www.w3.org/TR/REC-html40.
  • 3[3]Guan T, Wong K F. KPS: a Web information mining algorithm[J]. Computer Networks, Elsevier, 1999.31:1495~1507
  • 4R H Song.information retrieval baaed on structural and semantic information[DJ.master thesis.computer science and technology of tsinghua university PRC ,2002
  • 5J L Chen,B Y Zhou,J Shi et al.Function-Based Object Model Towards Website[C].In:10th international world wide web conference,conference proceedings,2001:587~596
  • 6Milos Kovacevic,Michelangelo Diligenti,Marco Gori et al.Recognition of Common Areas in a Web Page Using Visual Information:a possible application in a page classification[C].In:Proceedings of the 2002IEEE International Conference on Data Mining(ICDM2002)Maebashi City ,Japan ,2002:250~257
  • 7G Salton,M McGill.Introduction to Modem Information Retrieval[M].New York:McGraw-Hill,1983
  • 8JAWS.http://www.hj.cm
  • 9home page reader.http://www-3.ibm.com/able/hpr.html
  • 10DOM Interest Group. Document Object Model (DOM) [EB/OL]. http: //www. w3. org/DOM/, 2006-06-12.

共引文献27

同被引文献23

  • 1沈国强,覃征,沈云斐.一种高效的多维多层关联规则挖掘算法[J].计算机工程与应用,2006,42(12):174-176. 被引量:7
  • 2张玉峰,部先永,晏创业.动态竞争情报及其采集基础[J].中国图书馆学报,2006,32(6):28-31. 被引量:13
  • 3郑旭玲,周昌乐,李堂秋,陈毅东.基于关联规则挖掘的汉语语义搭配规则获取方法[J].厦门大学学报(自然科学版),2007,46(3):331-336. 被引量:5
  • 4谌志群,张国煊.文本挖掘与中文文本挖掘模型研究[J].情报科学,2007,25(7):1046-1051. 被引量:48
  • 5陈骏.基于语义网的文本信息分类技术研究[D].南京:南京理工大学,2007.
  • 6Song D, Bruza P, Huang Z, et al. Classifying Document Titles Based on Information Inference [ C ]. Procedings of the lgth International Symposium on Methodologies for Intelligent Systems, 2003, Japan. Berlin, Heidelberg: Springer ,2003:297-306.
  • 7Zelikovitz S. Transductive LSI for Short Text Classification Problems [ C ]. Proeeedings of the 17th International FLAIRS Conference, Miami : AAAI Press ,2004.
  • 8Sedding J, Kazakov D. WordNet-based Text Document Clustering [ C ]. Proceedings of the Third Workshop on Robust Methods in Analysis of Natural Language Data ( ROMAND ) , Geneva, 2004 : 104-113.
  • 9Sarnovsky M, Paralic M. Text Mining Workflows Constr- uction with Support of Ontologies [ C ]. Proc. of the 6th International Symposium on Applied Machine Intelligence and Informatics, SAMI' 08, January 21-22,2008, Herlany, Slovakia. Hungary : Budapest Polytechnic, 2008 : 173-177.
  • 10Selvi P, Gopalan N P. Sentence Similarity Computation Based on Wordnet and Corpus Statistics [ C ]. Interna tional Conference on Computational Intelligence and Multimedia Applications, 13-15 Dec. 2007, Sivakasi, Tamil Nadu. Washington, DC: IEEE Computer Society, 2007,1:9-14.

引证文献2

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部