期刊文献+

A New Generalized Similarity-Based Topic Distillation Algorithm

A New Generalized Similarity-Based Topic Distillation Algorithm
下载PDF
导出
摘要 The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings. The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings.
出处 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期789-792,共4页 武汉大学学报(自然科学英文版)
基金 Supported by the Shaanxi Provincial Educational Depar tment Special-Purpose Technology and Research of China (06JK229)
关键词 generalized similarity hypertext induced topic search topic distillation topic drift generalized similarity hypertext induced topic search topic distillation topic drift
  • 引文网络
  • 相关文献

参考文献11

  • 1陈宁,陈安,周龙骧,贾维嘉,罗三定.基于模糊概念图的文档聚类及其在Web中的应用[J].软件学报,2002,13(8):1598-1605. 被引量:12
  • 2Brin S,Page L.The Anatomy of A Large-Scale Hyper-Textual Web Search Engine[].Proceedings of the th International World Wide Web Conference.1998
  • 3Rafiei D,Mendelzon A O.What is This Page Known for Computing Web Page Reputations[].Computer Networks.2000
  • 4Bharat K,Henzinger M R.Improved Algorithms for Topic Distillation in a Hyperlinked Environment[].Proceedings of the st Annual International ACM SIGIR Conference on Re-search and Development in Information Retrieval.1998
  • 5Chakrabarti S.Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction[].Proceeding of the th ACM WWW Interna- tional Conference.2001
  • 6Golub G H,VanLoan C F.Matrix Computations[].J opt Soc Am:.1996
  • 7Koji E,Hidertaka I.Adaptive and Incremental Query Expan- sion for Cluster-based Browsing[].Proceedings of the th International Conference on Database Systems for Advanced Applications.1999
  • 8Zhang Ling,Ma Fanyuan,Ye Yunming, et al.CALA: A Web Analysis Algorithm Combined with Content Correlation Analysis Method[].Journal of Computer Science and Tech- nology.2003
  • 9Joshi A,Punyapu C,Karnam P.Personalization and Asyn- chronicity to Support Mobile Web Access[].Proceedings of Workshop Web Information and Data Management th Int Conf Information and Knowledge Managemen.1998
  • 10Perkowitz M,Etzioni O.Adaptive Websites: Automatically Synthesizing Web Pages[].Proceedings of AAAI.1998

二级参考文献10

  • 1[1]Han, J., Cai, Y., Cercone, N. Knowledge discovery in databases: an attribute-oriented approach. In: Yuan, Le-yan, ed. Proceedings of the 18th International Conference on Very Large Data Bases. Vancouver: Morgan Kaufmann, 1992. 547~559.
  • 2[2]Srikant, R., Agrawal, R. Mining generalized association rules. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 407~419.
  • 3[3]Han, J., Fu, Y. Discovery of multiple-level association rules from large database. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 420~431.
  • 4[4]Oren, Z., Oren, E., Omid, M., et al. Fast and intuitive clustering of web document. In: Heckerman, D., Mannila, H., Pregibon, D., eds. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD'97). Newport Beach, CA: AAAI Press, 1997. 287~290.
  • 5[5]Cheung, D.W., Kao, B., Lee, J. W. Discovering user access patterns on the world-wide-web. In: Lu Hong-jun, Motoda, H., Liu, Huan, eds. Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore: World Scientific, 1997. 303~316.
  • 6[6]Salton, G., Buckley, C. Term-Weighting approaches in automatic text retrieval. Information Processing and Management, 1988,24(5):513~523.
  • 7[7]Oren, Z. Clustering web documents: a phrase-based method for grouping search engine results [Ph.D. Thesis]. Seattle, WA: University of Washington, 1999.
  • 8[8]Bezedek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
  • 9[9]Ruspini, E.H. A new approach to clustering. Information Control, 1969,19(15):22~32.
  • 10[10]Luo, San-ding. Efficient intelligent search system for web information mining (EIS). In: Goscinski, A., Horace, H.S.I, Jia, Wei-jia, et al, eds. Proceedings of the 4th International Conference on Algorithms and Architecture for Parallel Processing (ICA3PP 2000). Hong Kong: World Scientific Publishing, 2000. 716~717.

共引文献11

;
使用帮助 返回顶部