期刊文献+

基于改进后缀树算法中英文聚类引擎的实现 被引量:1

Implementation of Chinese and English Clustering Engine Based on Improved Suffix Tree Algorithm
下载PDF
导出
摘要 提出一种基于改进后缀树与交互聚类思想相结合的算法ISTC算法,通过改造传统后缀树结构实现了对文档标题和摘要的层次化聚类,同时用交互聚类的方式替代了传统的递归算法.ISTC算法具有语言无关性,不仅适用于基于单词的西方文字,而且可以在不引入词典分词技术的情况下有效地处理基于单字的中文字符.在此算法基础上,设计并实现了基于改进后缀树算法的交互聚类引擎,在不同的网络环境下对其进行了系统测试,并与其他元搜索引擎进行了对比.实验结果表明,使用改进后缀树算法进行实时交互式聚类是可行的. This paper presents an algorithm based on the improved suffix tree and interact-clustering idea. Hierarchical clustering for document title and summary is implemented by improved traditional suffix tree structure. Meanwhile, the interactive clustering is employed instead of traditional recursive algorithm. The algorithm is not related with language. Not only is it applicable to word-based English, but also it can deal effectively with character-based Chinese without dictionary-based Chinese word segmentation. Furthermore, the interactive clustering engine was realized on the basis of the algorithm, the system was tested in different network environments, and the experimemnt demonstrates that improved suffix tree algorithm. performance of the system was compared wi it is feasible effectively to conduct real-time th other meta-search engines. The interactive clustering by using the
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2009年第2期299-304,共6页 Journal of Jilin University:Science Edition
基金 吉林省科技发展计划项目基金(批准号:20070533)
关键词 后缀树 文本聚类 元搜索引擎 suffix tree text clustering meta search engine
  • 相关文献

参考文献10

  • 1XU Rui, Wunsch D Ⅱ. Survey of Clustering Algorithms [J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678.
  • 2肖建华,蒋明,何瑗,柏文阳.二次搜索系统的设计与实现[J].计算机应用研究,2003,20(9):123-126. 被引量:29
  • 3Osinski S, Weiss D. A Concept-driven Algorithm for Clustering Search Results [ J ]. IEEE Intelligent Systems, 2005, 20 ( 3 ) : 48-54.
  • 4Zamir O, Etzioni O. Grouper: a Dynamic Clustering Interface to Web Search Resuhs [ J]. Computer Networks, 1999, 31(11) : 1361-1374.
  • 5Di Giacomo E, Didimo W, Grilli L, et al. Graph Visualization Techniques for Web Clustering Engines [ J ]. IEEE Transactions on Visualization and Computer Graphics, 2007, 13 (2) : 294-304.
  • 6Ukkonen E. On-line Construction of Suffix Trees [J]. Algorithmiea, 1995, 14(3) : 249-260.
  • 7Ferragina P, Gulli A. The Anatomy of a Hierarchical Clustering Engine for Web-page News and Book Snippets [ C ]// Proceedings of the Fourth IEEE International Conference on Data Mining. Washington: IEEE Computer Society, 2004: 395 -398.
  • 8Lawrie D, Croft W B, Rosenberg A. Finding Topic Words for Hierarchical Summarization [ C]//Proceedings of SIGIR. New Orleans: ACM, 2001 : 349-357.
  • 9ZENG Hua-jun, HE Qi-cai, CHEN Zheng, et al. Learning to Cluster Web Search Results [ C]//Proceedings of SIGIR. New York: .ACM, 2004: 210-217.
  • 10钱功伟,倪林,田甜,曹荣.带聚类处理的元搜索引擎的设计与实现[J].计算机工程与应用,2007,43(22):182-185. 被引量:8

二级参考文献17

  • 1J M Kleinberg. Authoritative Sources in a Hyperlinked Environment[J]. In ACM Symp. on Discrete Algorithms, 1998.
  • 2Sergey Brin,Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine[C]. In Proceeding of The Seventh International World Wide Web Conference. Apr 1998.
  • 3Chakrabarti S, Dom B, Gibson D, et al. Automatic Resource Compilation by Analyzing Hyperlink Struettrre and Associated Text[C]. Proc. Of 7^th World Wide Web Conference, 1998.65-74.
  • 4Krishna Bharat, Monika R, Henzinger. Improved Algorithms for Topic Distillation in a Hypedinked Environment[Z]. 1998.
  • 5Justin Picard,Jacques Savoy. Searching and Classifying the Web Using Hyperlinks: A Logical Approach[C]. 23^rd European Colloquium on Information Retrieval Research,2001.
  • 6Raymond Kosala, Hendrik Blockeel. Web Mining Research:A Survey[J]. ACM SIGKDD Explorations,2000,2(1) : 1-15.
  • 7bbmao社会化搜索引擎[EB/OL].http://www.bbmao.com.
  • 8Mishra R K,Prabhakar T V.KhojYantra:an integrated metasearch engine with classification,clustering and ranking[C]//The 2000 Intemational Database Engineering and Applications Symposium,18-20 Sept 2000:122-133.
  • 9Abawajy J H,Hu M J.A new internet meta-search engine and implementation[J].Computer Systems and Applications,2005:103.
  • 10Meng Wei-yi,Liu King-Lup,Yu Clement.Estimating the usefulnessof search engines[C]//1%oceedings 15th International Conference on Data Engineering,23-26 March 1999:146-153.

共引文献35

同被引文献12

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部