期刊文献+

基于语料库的潜语义信息度量

Latent semantic information measurement of corpus orientation
下载PDF
导出
摘要 为关键词定义了与主题或语义相关联的信息度量。首先获取基于主题的语料库,然后建立语料库的潜语义向量空间模型,通过该模型定义关键词的信息度量。由此可以计算任意文档包含该主题的信息量,定义文档对主题的隶属度。设定文档对主题隶属度阈值,从而判断文档是否属于该主题类。实验表明,与主题或语义关联的信息度量可以克服搜索中"词匹配"的不足,达到"语义匹配"的搜索。 The authors defined an information measurement associated with a topic or semantics for a keyword. Firstly, the topic-based corpus was obtained. Then the latent semantic vector space model of the corpus was established. After that, the information measurement of the keyword was defined through the model. Accordingly, the amount of the topic information any document contained could be calculated. Lastly, the membership measurement which measured the membership degree of the document belonging to the topic was introduced. A measurement threshold was set, thereby it determined whether the documents belonging to the topic or not. The experimental results show that the definition of the information measurement can get over the difficulty of the word-match search and really reach the goal of the semantic-match search.
出处 《计算机应用》 CSCD 北大核心 2009年第9期2450-2453,2467,共5页 journal of Computer Applications
基金 上海市科学技术委员会科技攻关项目(055115001) 上海工程技术大学大学生创新项目(cx082100)
关键词 潜语义 信息度量 度量分布 隶属度 latent semantics information measurement metric distribution membership degree
  • 相关文献

参考文献12

二级参考文献42

  • 1[1]Han, J., Cai, Y., Cercone, N. Knowledge discovery in databases: an attribute-oriented approach. In: Yuan, Le-yan, ed. Proceedings of the 18th International Conference on Very Large Data Bases. Vancouver: Morgan Kaufmann, 1992. 547~559.
  • 2[2]Srikant, R., Agrawal, R. Mining generalized association rules. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 407~419.
  • 3[3]Han, J., Fu, Y. Discovery of multiple-level association rules from large database. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 420~431.
  • 4[4]Oren, Z., Oren, E., Omid, M., et al. Fast and intuitive clustering of web document. In: Heckerman, D., Mannila, H., Pregibon, D., eds. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD'97). Newport Beach, CA: AAAI Press, 1997. 287~290.
  • 5[5]Cheung, D.W., Kao, B., Lee, J. W. Discovering user access patterns on the world-wide-web. In: Lu Hong-jun, Motoda, H., Liu, Huan, eds. Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore: World Scientific, 1997. 303~316.
  • 6[6]Salton, G., Buckley, C. Term-Weighting approaches in automatic text retrieval. Information Processing and Management, 1988,24(5):513~523.
  • 7[7]Oren, Z. Clustering web documents: a phrase-based method for grouping search engine results [Ph.D. Thesis]. Seattle, WA: University of Washington, 1999.
  • 8[8]Bezedek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
  • 9[9]Ruspini, E.H. A new approach to clustering. Information Control, 1969,19(15):22~32.
  • 10[10]Luo, San-ding. Efficient intelligent search system for web information mining (EIS). In: Goscinski, A., Horace, H.S.I, Jia, Wei-jia, et al, eds. Proceedings of the 4th International Conference on Algorithms and Architecture for Parallel Processing (ICA3PP 2000). Hong Kong: World Scientific Publishing, 2000. 716~717.

共引文献109

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部