期刊文献+

一种面向海量实时数据的信息检索算法 被引量:1

An Information Retrieval Algorithm for Massive and Real-time Data
下载PDF
导出
摘要 网络信息资源的迅猛膨胀推进了信息检索技术的发展和成熟,但将现有的技术应用于海量实时网络数据时,传统的信息检索算法仍存在种种不足之处.本文中以CER-NET华(东)北地区的海量实时网络数据环境为依托,研究和设计了两段向量簇聚类信息检索算法,通过插入聚类和优化聚类两阶段的操作,提供高效的信息处理能力.同时,基于簇聚类树实现了群发邮件甄别的应用,对网络数据中的垃圾邮件进行过滤,进一步地提高检索效率. With the rapid expansion of information resources in networks, information retrieval technologies are now becoming more and more well-developed. But their current applications to massive and real-time data, especially for the conventional information retrieval algorithms, still reveal some shortcoming. In this paper, aiming at the massive and real-time network data from CERNET East China North center, a two-phase vector clustering algorithm is investigated and designed, in which a high-efficiency information processing ability is implemented by a two-phase operation; clustering insertion and clustering optimization. Meanwhile, the application of the proposed algorithm in the group mail discrimination system for filtering junk mails of network data is achieved by means of the clustering tree. Thus, the retrieval efficiency is further improved.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2004年第z1期6-10,共5页 Journal of South China University of Technology(Natural Science Edition)
关键词 信息检索 簇聚类 两段向量 邮件甄别 information retrieval clustering two-phase vector mail discrimination
  • 相关文献

参考文献5

  • 1[1]北京晨报.垃圾邮件超过正常邮件,四部门将进行专项治理[EB/OL].http://www.sina.com.cn,2004-02-04.
  • 2[3]Ron Papka,James Allan. Document classification using multiword features [A]. [s.n]. Proceedings of the Seventh International Conference on Information and Knowledge Management New York [C]. New York USA: ACM Press, 1998. 124 - 131.
  • 3[4]http ://www. cs. jhu. edu/~ weiss/glossary. html.
  • 4[5]Salton G,Wong A,Yang C S. A vector space model for automatic indexing [J]. Communications of the ACM New York, 1975,18 (11) :613 - 620.
  • 5[6]Salton G,McGill M J. The Smart and Sire Experimental Retrieval Systems (Morgan Kaufmann Multimedia Information and Systems Series) [M]. San Trancisce, CA,USA :Morgan Kaufmann Publishers Inc, 1997. 381 - 399.

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部