期刊文献+

基于电子商务用户行为的同义词识别 被引量:2

Synonym Recognition Based on User Behaviors in E-commerce
下载PDF
导出
摘要 该文研究了电子商务领域同义词的自动识别问题。电子商务领域的同义词是指对同一事物或概念的不同表达,即在商品描述和检索中可以相互替换的词,针对该领域新词多、错别字多、近义词多的特点,提出基于用户行为的同义词识别方法。首先通过并列关系符号切分商品标题和基于SimRank思想聚集查询两种方法获取候选集合,进而获取两词的字面特征以及标题、查询、点击等用户行为特征,然后借助Gradient Boost Decision Tree模型判断是否同义。实验表明同义词识别准确率达到56.52%。 Focused on the synonym recognition in e-commerce.this paper presents a method to recognize synonyms based on user behaviors to deal with the considerable new words,typos,and near-synonyms in this domain.Firstly,candidate synonym sets are retrieved by analyzing the titles and their corresponding queries based on SimRank theory.Then,features including literal feature,title feature,query feature,click feature are extracted.Finally,Gradient Boost Decision Tree model is adopted to determine whether candidate synonyms are true or not.The experimental result shows that Gradient Boost Decision Tree(GBDT) is more suitable for this task,achieving a precision of 56.52%.
机构地区 哈尔滨工业大学
出处 《中文信息学报》 CSCD 北大核心 2012年第3期79-85,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60975077 90924015)
关键词 同义词识别 用户行为 SIMRANK GRADIENT BOOST DECISION TREE synonym recognition user behaviors SimRank Gradient Boost Decision Tree
  • 相关文献

参考文献18

  • 1H.Coleridge,J.Murray,H.Sweet,et al.The OxdordEnglish Dictionary[M].Oxford:Oxford UniversityPress,2005.
  • 2N.Kanhabua,K.Norvag.Determing time-basedsynonyms in searching document archives[C] //Proceedings of ECDL.2010.
  • 3宋明亮.汉语词汇字面相似性原理与后控制词表动态维护研究[J].情报学报,1996,15(4):261-271. 被引量:19
  • 4穗志方,俞士汶.主题概念规范化研究中的自然语言处理策略[C] //第二届术语学、标准化与技术传播学术会议论文集.北京:科学出版社,1998:367-374.
  • 5刘群 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002,7(2):59-76.
  • 6Vincent D.Blondel,Pierre P.Senellart.Automaticextraction of synonyms in a dictionary[C] //Presentedat the Text Mining Workshop.Arlington:2002.
  • 7J.Jannink.Thesaurus entry extraction from an on-linedictionary[C] //Proceedings of Fusion99,SunnyvaleCA:1999.
  • 8Hsinchun Chen,Kevin J.Lynch.Automaticconstruction of networks of concepts characterizingdocument database[C] //Proceeding of IEEETransactions on Systems,Man and Cybernetics.1992,22(5):885-902.
  • 9Gregory Grefenstette.Automatic thesaurusgeneration from raw text using knowledge-poortechniques[C] //Proceeding of Making Sense ofWords.Ninth Annual Conference of the UW Centrefor the New OED and text Research.1993,9.
  • 10Peter D.Turney.Mining the web for synonyms:PMI-IR versus LSA on TOEFL[C] //Proceeding ofEuropean Conference on Machine Learning.2001:491-502.

二级参考文献8

共引文献170

同被引文献28

  • 1孙洪波.构建知识库(六) 知识的颗粒度[J].软件工程师,2004(12):42-42. 被引量:5
  • 2王兰成,李超.改进的中文同义词相似匹配方法[J].中国图书馆学报,2005,31(3):61-64. 被引量:6
  • 3陆勇,侯汉清.基于模式匹配的汉语同义词自动识别[J].情报学报,2006,25(6):720-724. 被引量:21
  • 4戴维民.信息组织[M].北京:高等教育出版社,2007.
  • 5Pantel P, Lin Dekang. Discovering word senses from text [ C ]// Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining. Edmonton: ACM Press, 2002 : 613 - 619.
  • 6van der Plas L, Tiedemann J. Finding synonyms using automatic word alignment and measures of distributional similarity[ C ]//Pro- ceedings of 44th Annual Meeting of the Association for Computa- tional Linguistics. Sydney: Association for Computer Linguistics Press. 2006 : 866 - 873.
  • 7Tao Cheng, Lauw H W, Paparizos S. Entity synonyms for struc- tured Web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2012,24(10) : 1862 - 1875.
  • 8Masato H, Yasuhiro O, Katsuhiko T. Supervised synonym acquisi- tion using distributional features and syntactic patterns [ J ]. Infor- mation and Media Technologies, 2009, 4(2) : 558 -582.
  • 9Kaji N, Kitsuregawa M. Using hidden markov random fields to combine distributional and pattern - based word clustering [ C ]// Proceedings of the 22nd International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics Press, 2008:401 -408.
  • 10Snow R, Jurafsky D, Ng A. Learning syntactic patterns for auto- matic 'hypemym discovery [ C ]//Proceedings of 17th International Conference on Neural Information Processing Systems. Vancouver: MIT Press, 2004 : 1297 - 1304.

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部