一种利用链接信息检索关键资源的算法被引量：2

An Algorithm of Applying Linkage Information to Retrieving Key Resources

下载PDF

导出

摘要随着互联网的发展,基于Web的信息处理技术越来越受到人们的重视,也是当前研究的前沿课题。本文探讨的是如何在现有检索技术的基础上,利用Web网页的链接信息,自动地得到更高质量的检索结果——关键资源。本文提出一种同时利用Web网页的结构和内容信息以及链接信息的新方法:先结合同页的结构信息和内容评分得到网页的文档评分,然后基于网页出链的文档评分计算网页的链接评分。实验表明,本文的方法减少了无用链接的干扰,比单纯利用链接信息的效果好得多。 With the development of the Internet, Web-based information processing is becoming more and more important, and it is one of the frontiers in current research. In this paper, we describe a new approach which is based oh the traditional retrieval technology and can automatically obtain the high-quality retrieval results from the basic retrieval ones by utilizing the Web information (including structure information and link information):calculating the document score of a Web page by combining its structure information and content score, then computing the link score of the page based on the document scores of its out-linking pages. In the experiments, this approach reduces the influence of the 'noisy' links and achieved fairly good resuhs.

作者顾健黄萱菁吴立德

机构地区复旦大学计算机科学与工程系

出处《计算机科学》 CSCD 北大核心 2004年第10期189-192,共4页 Computer Science

基金国家自然科学基金(69935010 60103014) 863项目(2001AA114120 2002AA142090)

关键词链接文档 WEB网页基于WEB 结构信息算法信息检索检索结果检索技术前沿课题 Link information,Key resource,Web retrieval

分类号 TP393 [自动化与计算机技术—计算机应用技术] TP317 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Murray B H,Moore A. Sizing the Internet. White paper, Cyveillance Inc. , July 2000
2Voorhess E M,Harman D. Overview of the Eighth Text Retrieval Conference (TREC-8). TREC 8, Gaithersburg, Maryland, Nov.1999
3Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. In: Proc. of the Seventh Intl. World-Wide Web Conf. 1998
4Kleinberg J M. Authoritative Sources in a Hyperlinked Environment. In: Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998
5Weiss R,et al. Hypursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In:Seventh ACM Conf. on Hypertext, March 1996
6Lempel R, Moran S. The stochastic approach for link-structure analysis (SALSA)and the TKC effect. In: 9th Intl. World Wide Web Conf. Amsterdam, Netherlands, May 2000
7Hawking D. Overview of the TREC-9 Web Track. In:Proc. of the
8Wu Lide, Huang Xuanjing,Niu Junyu, Xia Yingju, Feng Zhe,Zhou Yaqian. FDU at TREC2002: Filtering, Q&A, Web and Video Tasks. In:Proc. of the 11th Text Retrieval Conf. (TREC2002), 2002

同被引文献11

1赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078. 被引量：28
2刘里,何中市.基于关键词语的文本特征选择及权重计算方案[J].计算机工程与设计,2006,27(6):934-936. 被引量：12
3王枝军,强俊,程效军.基于Web的信息检索系统的设计与实现[J].计算机工程与设计,2006,27(6):1025-1027. 被引量：10
4AAS K, EIKVIL L. Text categorization: A survey [ R/OL]. http://citeseer, ist. psu. edu/aas99text, html.1999.
5TREC - 2002 Web Track Guidelines [ C/OL ]. http://trcc. nist. gov. 2002.
6SALTON G,MCGill M J.An Introduction to Modern Information Retrieval[M].[s.l.]:McGraw-Hill,1983.
7付德宇代成琴.一个面向文本分类的中文特征词自动抽取方法[EB/OL].中国科技论文在线(http://www.paper.edu.cn),2005-05-27.
8YANG Y, PEDERSEN J P. Comparative study on feature selection in statistical learning of text categorization [ A ].In the 14th Int. Conf on machine Learning[ C]. http://citeseer, ist. psu. edu/Yang97comparative, html/1997.
9李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量：108
10张波,王继成,王强,张福炎.Web文档清洗技术[J].计算机科学,2002,29(6):52-54. 被引量：3

引证文献2

1付德宇,代成琴,仲玮.基于关键资源的网站自动分类系统[J].哈尔滨工业大学学报,2006,38(1):19-21. 被引量：1
2王卫玲,刘培玉,刘克非.改进的Web链接主题提取算法[J].计算机工程与设计,2007,28(2):294-296. 被引量：1

二级引证文献2

1戴伟.情报学视角与社会学视角的链接分析比较[J].中国科技资源导刊,2008,40(3):21-25. 被引量：1
2邓厚平,武刚.基于爬虫和网站分类的主题信息源发现方法[J].计算机工程与应用,2016,52(3):59-65. 被引量：2

1付兵,肖小玲.一种基于Word文档的高隐藏率水印算法[J].长江大学学报（自科版）（上旬）,2007,4(2):55-57. 被引量：4
2熊罗凯,陈超,徐佑军.基于图割的快速图像贴图技术[J].计算机工程与应用,2016,52(13):218-221. 被引量：1
3李晋惠,李艳霞.一种改进的保留图像细节信息的图像增强化方法[J].电子测试,2013,24(10X):51-52. 被引量：2
4王志丹.Excel在人事管理中的高效运用[J].电子技术与软件工程,2014(20):92-92.
5潘惠勇,高丽平,薛惠忠.基于逻辑定义的Web信息抽取与集成[J].中原工学院学报,2005,16(2):53-56.
6霍宇欢,伍岳庆,姚宇.基于改进各向异性的水平集医学图像去噪[J].计算机应用,2015,35(A01):270-272.
7数字LED打印机[J].印刷质量与标准化,2004(12):47-47.
8张昕,孙江辉.舆情监测系统设计[J].现代电子技术,2015,38(11):98-102. 被引量：6
9王征.学校信息技术精细化发展的探究与实践[J].中国信息技术教育,2013(6):131-132.
10李崇飞,高颖慧,卢凯,曲智国.基于相位谱和调谐幅度谱的显著性检测方法[J].中国图象图形学报,2012,17(7):821-827. 被引量：5

计算机科学

2004年第10期

浏览历史

内容加载中请稍等...

一种利用链接信息检索关键资源的算法被引量：2

参考文献8

同被引文献11

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种利用链接信息检索关键资源的算法 被引量：2

参考文献8

同被引文献11

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种利用链接信息检索关键资源的算法被引量：2