期刊文献+

专家主页的信息块划分及特征提取研究 被引量:1

Research on the Message Block Division and Characteristics Extraction of the Expert Homepage
下载PDF
导出
摘要 挖掘专家主页中的信息具有重要的研究意义,因此如何描述专家主页的特征去识别实体内容成为挖掘过程中最为关键的一步。文章对专家主页中的主要信息块进行划分,介绍了识别信息块的主要方法。利用Dreamweaver软件对2 000个专家主页进行标注,然后利用文本特征、视觉特征以及结构特征来提取专家主页中专家基本信息、研究兴趣、研究项目和出版物信息的特征,进行特征构建。 Mining information in the expert homepage has important research significance, therefore how to describe the characteristics of the expert homepage to identify the entity content is the most critical step in the mining process. In this paper, the main message blocks of the expert homepage are divided, and the main methods of identifying the message blocks are introduced. The paper marks the 2 000 expert homepages with Dreamweaver software, and then uses the text features, visual features and structural characteristics to extract the basic information, research interest, research projects and publication information of the expert in the expert homepage to complete characteristics construction.
出处 《情报理论与实践》 CSSCI 北大核心 2013年第10期109-113,共5页 Information Studies:Theory & Application
关键词 专家主页 信息特征 信息提取 研究方法 expert homepage information characteristics information extraction research method
  • 相关文献

参考文献5

二级参考文献12

  • 1张志刚,陈静,李晓明.一种HTML网页净化方法[J].情报学报,2004,23(4):387-393. 被引量:57
  • 2于满泉,陈铁睿,许洪波.基于分块的网页信息解析器的研究与设计[J].计算机应用,2005,25(4):974-976. 被引量:55
  • 3M Satyanarayanan.Pervasive Computing:Vision and Challenges[J].IEEE Personal Communications,2001,6(8):10-17.
  • 4Mingqiu Song,Xintao Wu.Content Extraction from Web Pages Based on Chinese Punctuation Number[C]//Wireless Communications,Networking and Mobile Computing,2007.WiCom 2007:5568-5570.
  • 5Deng Cai,Yu Shipeng,Wen Jirong et al.VIPS:a vision-based page segmentation algorithm[R].Microsoft Technical Report,MSR-TR-2003-79,2003.
  • 6Baeza-Yates,R.Algorithms for string matching:A survey.[J].ACM SIGIR Forum,1989,23(3-4):34-58.
  • 7LIN S-H,HO J-M.Discovering informative content blocks from Web documents[A].the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'02)[C].July,2002.
  • 8DENG C,YU SP,WEN JR,et al.VIPS:A Vision-Based Page Segmentation,MSR-TR-2003-79[R].2003.
  • 9KOVACEVIC M.Recognition of common areas in web page using visual information:A possible application in a page classification[A].Proceedings of ICDM02[C].Maebashi,Japan:IEEE Press,2002.250-258.
  • 10HANZLIK S.Gorilla Design Studios Presents:The Hosts File[EB/OL].http://aocs-net.com/hosts/,2006.

共引文献45

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部