摘要
该文研究了电子商务领域同义词的自动识别问题。电子商务领域的同义词是指对同一事物或概念的不同表达,即在商品描述和检索中可以相互替换的词,针对该领域新词多、错别字多、近义词多的特点,提出基于用户行为的同义词识别方法。首先通过并列关系符号切分商品标题和基于SimRank思想聚集查询两种方法获取候选集合,进而获取两词的字面特征以及标题、查询、点击等用户行为特征,然后借助Gradient Boost Decision Tree模型判断是否同义。实验表明同义词识别准确率达到56.52%。
Focused on the synonym recognition in e-commerce.this paper presents a method to recognize synonyms based on user behaviors to deal with the considerable new words,typos,and near-synonyms in this domain.Firstly,candidate synonym sets are retrieved by analyzing the titles and their corresponding queries based on SimRank theory.Then,features including literal feature,title feature,query feature,click feature are extracted.Finally,Gradient Boost Decision Tree model is adopted to determine whether candidate synonyms are true or not.The experimental result shows that Gradient Boost Decision Tree(GBDT) is more suitable for this task,achieving a precision of 56.52%.
出处
《中文信息学报》
CSCD
北大核心
2012年第3期79-85,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60975077
90924015)