摘要
由于Web2.0的出现,电子商务数据经常由不同网站和不同用户输入,从而同一商品存在着多种描述,这为用户检索和对比商品带来了困难。针对这种情况,本文基于商品名信息对商品进行分类,使得每一类描述一种现实中的商品。本文提出的系统拟将商品名拆分成为关键词集合,基于关键词集合相似性进行分类。对关键词拆分方法、基于集合的分类方法、关键词权重设置方法和相关反馈进行了研究。实验结果表明:本文提出的方法可以快速有效地对商品进行分类,并且权重设置和相关反馈策略可以有效地提高实体识别的准确性。
With the emergence of Web2.0,e-commerce data are often input by different websites and users;thus,there can be many descriptions of the same commodity.This makes it very difficult for users to search for and compare commodities.This paper proposes a method for classifying commodities based on their trade names,such that each category describes an actual type of commodity.The system proposed in this paper splits the trade name into sets of keywords and subsequently classifies them based on the similarity of their keyword sets.In this paper,we propose strategies for keyword splitting,set-based classification,keyword weight setting,and related feedback.The experimental results show that the proposed method can classify commodities quickly and effectively,and the weight-setting and related-feedback strategies can effectively improve the accuracy of entity identification.
作者
安先喜
田英鑫
郭子阳
石胜飞
AN Xianxi;TIAN Yingxin;GUO Ziyang;SHI Shengfei(School of Economics and Management,Harbin University of Engineering,Harbin 150001,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2019年第7期1334-1339,共6页
Journal of Harbin Engineering University
基金
国家重点研发计划(2016YFB1000703)
国家自然科学基金项目(U1509216,U1866602,61472099,61602129)
关键词
实体
实体识别
电子商务
算法
数据库
语义学
交易描述
数据模型
entity
entity identification
electronic commerce
algorithm
databases
semantics
transaction description
data models