期刊文献+

Blog网页分类与识别技术研究 被引量:6

Study on the classification and identification of Blog pages
下载PDF
导出
摘要 为了找到一种自动将Blog网页区别于其他Web页面的方法,以便针对Blog语料进行内容抽取、对Blog社区进行规律性研究和发现等,针对Blog网页的特点与规律,提出一种根据网页结构和关键字计算相似度的方法识别Blog网页,初步的实验结果表明,达到了较高的识别正确率。 In order to find an automatic way to recognize the Blog pages from other Web pages for the content extraction of the Blog pages and other researches. According to the characteristic of Blog pages, some basic concepts and ideas in the area of Blog was described, and a novel method on the identification of Blog pages was proposed based on the structure of the Blog pages and keywords. The experimental results showe that a high result can be achieved in precision.
出处 《通信学报》 EI CSCD 北大核心 2007年第12期156-160,共5页 Journal on Communications
基金 国家自然科学基金资助项目(60736044) 国家高技术研究发展计划("863"计划)资助项目(2006AA01Z150 2004AA11701008)~~
关键词 Blog网页识别 相似度计算 网页分类 Blog pages identification similarity computing Blog pages classification
  • 相关文献

参考文献6

  • 1朱明,王军,王俊普.Web网页识别中的特征选择问题研究[J].计算机工程,2000,26(8):35-37. 被引量:29
  • 2朴星海,赵铁军,郑德权等.面向Blog的网页爬行器设计与实现[A].中文信息学25周年会议论文集[C].2006.
  • 3W3C. HTML 4.01 specification[EB/OL], http://www.w3.org/TR/html14/.
  • 4LIAN W, CHEUNG D W C. An efficient and scalable algorithm for clustering XML documents by structure[J]. IEEE Trans on Knowledge and Data Engineering, 2004, 16(1): 82-96.
  • 5SALTON G. Introduction to Modem Information Retrieval[M]. New York: McGraw Hill Book Company, 1983.
  • 6袁家政,须德,鲍泓.基于结构与文本关键词相关度的XML网页分类研究[J].计算机研究与发展,2006,43(8):1361-1367. 被引量:13

二级参考文献11

  • 1G Salton. Automatic Text Processing [M]. New York:Addison-Wesley, 1989
  • 2T Joachims. Making large-scale SVM learning practical [G].In: B Schokopf, C Burges, A Smola, eds. Advances in Kernel Methods Support Vector Learning. Cambridge, MA. USA:MIT Press, 1999
  • 3D D Lewis, R E Schapore, J P Callan, et al. Training algorithms for linear text classifiers [C]. The 19th Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, Zurich, 1996
  • 4W W Cohen, Y Singer. Context-sensitive learning methods for text categorization [C]. The 19th Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, Zurich,1996
  • 5Joerg Leukel, Volker Schmitz, Frank-Dieter Dorloff. Modeling and exchange of product classification systems using XML [C].The 4th IEEE Int'l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS2002), Newport Beach, California, USA, 2002
  • 6M J Zaki, C C Aggarwal. XRules: An effective structural classifier for XML data [C]. Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD' 03), Washington, D C,2003
  • 7Wang Lian, David Wai-lok Cheung. An efficient and scalable algorithm for clustering XML documents by structure [J].IEEE Trans on Knowledge and Data Engineering, 2004, 16( 1 ) : 82-96
  • 8G Salton. Introduction to Modem Information Retrieval [M].New York: McGraw Hill Book Company, 1983
  • 9Yang Yiming,Proceedings of the 14th International Conference on Machine rning,1997年,412页
  • 10李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108

共引文献40

同被引文献76

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部