摘要
海关商品HS编码分类是企业和个人进出口贸易的重要国际程序。HS编码分类可以看作是一个文本分类问题,即给定一段商品的描述,确定商品由HS编码表示的所属类别。然而,该任务比一般的文本分类任务更具挑战性,原因是商品描述文本具有特定的层次结构,同时商品描述文本展现出了两个层次上的序列特征,并且商品描述文本还存在关键信息分散且描述形式多样的特点。现有的文本分类方法无法综合考虑以上因素来捕获商品描述文本中的关键信息。对此,文中提出了一种融合文本序列和图信息的神经网络(Text Sequence and Graph Information combination Neural Network,TSGINN)模型,用于解决海关商品HS编码分类问题。TSGINN将HS编码分类问题定义为基于词共现网络的子图分类问题,通过图注意力网络建模非连续词之间的关联关系,同时利用分层的长短期记忆网络结合商品文本层次结构捕获多层次的序列信息。在真实海关商品数据集上进行了实验,结果表明TSGINN模型的HS编码分类效果优于其他分类方法。
Customs commodity HS code classification is an important international procedure for cross-border trade of enterprises and individuals.HS code classification can be regarded as a text classification problem,that is,given a paragraph of description for a commodity,to determine the category of the commodity represented by HS code.However,this task is more challenging than general text classification task.First,commodity description texts are organized with special hierarchical structures.Then commodity description texts present sequential features at two levels.In addition,the key information in the commodity description text is scattered and the description forms are diverse.Most of the existing classification methods cannot comprehensively consider the above factors to capture key information in the commodity description text.In this paper,we proposes a Text Sequence and Graph Information combination Neural Network(TSGINN)to solve the problem of customs commodity HS code classification.The TSGINN defines the HS code classification problem as a subgraph classification problem based on word co-occurrence network,models association between non-contiguous words through graph attention network,and captures multi-level sequential information through hierarchical long short-term memory network.Experiments on the real-world customs datasets show that the classification effect of TSGINN model is better than that of other methods.
作者
杜少华
万怀宇
武志昊
林友芳
DU Shao-hua;WAN Huai-yu;WU Zhi-hao;LIN You-fang(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport,Beijing 100044,China)
出处
《计算机科学》
CSCD
北大核心
2021年第4期97-103,共7页
Computer Science
关键词
海关商品
HS编码
文本分类
多层次序列信息
图注意力网络
Customs commodity
HS code
Text classification
Multi-level sequential information
Graph attention network