期刊文献+

定题Web信息采集中的元数据处理

Management of metadata in topic-specific Web information gathering
下载PDF
导出
摘要 论述了元数据在定题Web信息采集中的重要作用,分析了常见的元数据类型,确定了Href,AnchorText及Surrounding Text三种元数据类型最适合作为定题信息采集依据的元数据类型.利用关联规则挖掘,将支持度和置信度相结合作为相关度的判定标准,并采用禁用词过滤和相关策略过滤技术,给出了元数据的抽取与主题扩展迭代方法.实验证明所提出的元数据处理策略能使主题相关词和实际相关词较好符合,改善误包含和误排除的情况,为定题Web信息采集提供良好前提. In this paper, the significance of Web metadata in topic-specific information gathering was discussed and the common kinds of Web metadata were analyzed to confirm the appropriate kinds for topic-specific information gathering. It comes out that Href, Anchor Text and Surrounding Text are the three ones. Using association mining, support and confidence combine to make a standard for relevant judgment. Meanwhile, the technologies of metadata extraction and topic expansion are proposed with forbidden words filtering and relevance filtering. Experimental results indicate that our algorithm and strategies have low false inclusion and low false exclusion, and the relevant topics can inosculate well with the actual relevant topics. It provides better precondition for topic-specific information gathering.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2006年第10期37-40,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金资助项目(60574025 60074008) 湖北省自然科学基金资助项目(2004ABA055)
关键词 定题信息采集 元数据 抽取 主题扩展 topic-specific information gathering metadata extraction topic expansion
  • 相关文献

参考文献5

  • 1许欢庆,王永成,孙强.基于遗传算法的定题信息搜索策略[J].中文信息学报,2003,17(1):25-31. 被引量:5
  • 2Steinacker A, Ghavam A, Steinmetz R. Metadata standards for Web-based resources[J]. IEEE Multimedia, 2001, 8(1): 70-76.
  • 3Yi J, Sundaresan N, Huang A. Using metadata to enhance a web information gathering system[C]//Proceedings of the 3rd ACM SIGMOD Workshop on the Web and Databases. Dallas: ACM Press, 2000:11-16.
  • 4沈洁,薛贵荣.一种基于XML的Web数据挖掘模型[J].系统工程理论与实践,2002,22(9):74-77. 被引量:33
  • 5Han Jiawei, Kamber M. Data mining: concepts and techniques [M]. Beijing:Higher Education Press,2001.

二级参考文献12

  • 1[1]Bay T, Paoli J, Sperberg-McQueen C M. Extensible Markup Language(XML) 1.0 Specification World Wide Web Consortium Recommendation[EB/OL]. http://www.w3.org/TR/REC-xml/,1999.
  • 2[2]Ananel S S. Designing a kenel for data mining[J]. IEEE Expert on Intelligent System,1997,27(3):947-963.
  • 3[3]Lawrence S, et al. Searching the world wide web[J]. Science,1998,280(5360):98-100.
  • 4[4]Anne Lear. XML Seen as Integral to application integration[J]. IT Pro,1999,(9/10):1012-1031.
  • 5[1]P. DeBra, G. Houben, Y. Komatzky and R. Post. Information Retrieval in Distributed Hypertexts, In Proceedings of the 4th RIAO Conference, New York, 1994,481 -491.
  • 6[2]M. Hersovici, M. Jacovi,Y. Maarek, D. Peleg, M. Shtalhaim and S. Ur. The Shark-Search Algorithm-An Application:Tailored Web Site Mapping, In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, April, 1998.
  • 7[3]J. Cho, H. Garcia-Molina, L. Page. Efficient Crawling Through URL Ordering, L. Page. In Proceedings of the 7 th International WWW Conference, Brisbane, Australia, April, 1998.
  • 8[4]F. Menczer and R. Belew. Adaptive retrieval agents: Internalizing local context and scaling up to the web.Machine Learning, 2000,39 (2/3): 203 - 242.
  • 9[5]S. Mukherjea. WTMS: A System for Collecting and Analysing Topic-Specific Web Information, In Proceedings of the 9th International World Wide Web Conference, Amsterdam, Netherlands, May, 2000,15 -19.
  • 10[6]Chen,H. ,Chung,Y. M. ,Ramsey,M. & Yang,C. C. :"An intelligent personal spider(agent) for dynamic intemet/intranet searching", Decision Support Systems, 1998,23 (1): 41 - 58.

共引文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部