摘要
大部分电子商务网站为了吸引用户的关注,通常将商品的很多属性也纳入到商品名称中,使得商品名称中包括了冗余的信息,并产生不一致性.为解决这一的问题,提出了一个基于自注意力机制的商品名称精简模型,并针对自注意力机制网络无法直接捕捉商品名称序列特征的问题,利用门控循环单元的时序特性对自注意力机制进行了时序增强,以较小的计算代价换取了商品命名精简任务整体性能的提升.在公开商品短标题数据集LESD4EC的基础上,构造了商品名称精简数据集LESD4EC_L和LESD4EC_S,并进行了模型验证.一系列的实验结果表明本,所提出的自注意力机制冗长商品名称精简方法相对于其他商品名称精简方法在效果上有较大的提升.
E-commerce product title compression has received significant attention in recent years,since it can facilitate more specific information for cross-platform know- ledge alignment and multi-source data fusion.Product titles usually contain redundant descriptions,which can lead to inconsistencies.In this paper,we propose self-attention based neural networks for this task.Given the fact that self-attention mechanism networks cannot directly capture sequence features of product names,we enhance the mapping networks with a dot-attention structure,which was computed for the query and key-value pairs by a gated recurrent unit (GRU) based recurrent neural network.The proposed method improves the analytical capability of the model at a lower relative computational cost.Based on data from LESD4EC,we built two E-commerce datasets of product core phrases named LESD4EC L and LESD4EC S;we subsequently tested the model on these two datasets.A series of experiments show that the proposed model achieves better performance in product title compression than existing techniques.
作者
傅裕
李优
林煜明
周娅
FU Yu;LI You;LIN Yu-ming;ZHOU Ya(Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin Guangxi 541004,China)
出处
《华东师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第5期113-122,167,共11页
Journal of East China Normal University(Natural Science)
基金
国家自然科学基金(U1501252,U1811264,61562014)
广西自然科学基金重点项目(2018GXNSFDA281049)
桂林电子科技大学研究生优秀论文培养项目(17YJPYSS17)
广西可信软件重点实验室研究课题(kx201916)
关键词
自注意力机制
商品名称精简
门控循环单元
self-attention mechanism
product titles compression
gated recurrent units