期刊文献+

基于LSTM网络的中文地址分词法的设计与实现 被引量:6

Design and implementation of Chinese address segmentation method based on LSTM networks
下载PDF
导出
摘要 当前中文地址的分词法主要采用基于规则和传统机器学习的方法。这些方法需要人工长期维护词典和提取特征。为避免特征工程和减少人工维护,提出了将长短时记忆(long short-term memory,LSTM)网络和双向长短时记忆(bi-directional long short-term memory,Bi-LSTM)网络分别应用在中文地址分词任务中,并采用四词位标注法以及增加未标记数据集的方法提升分词性能。在自建数据集上的实验结果表明,中文地址分词任务应用Bi-LSTM网络结构能得到较好的性能,在增加未标记数据集的情况下,可以有效提升模型的性能。 Currently most methods for Chinese address segmentation are mainly based on rules and traditional machine learning technology. However,these methods maintain dictionary and extract features with artificial maintenance for a long time. In order to avoid feature engineering and reduce artificial maintenance,this paper applied LSTM and bidirectional LSTM to Chinese address segmentation,with four-tag-set and character embedding. This paper also added abundant unlabeled Chinese address to enhance the performance. The result on self-built set shows that both LSTM and bidirectional LSTM neural networks work well,and bidirectional LSTM has a bit good performance. Also,adding extra unlabeled set can great improve the performance.
作者 张文豪 卢山 程光 Zhang Wenhao;Lu Shan;Cheng Guang(guhan Research Institute of Posts & Telecommunications,guhan 430074,China;Nanjing Fiberhome Software Science & Technology Co.Ltd,Wanjing 210019,China;School of Computer Science & Engineering,Southeast University,Nanjing 211189,China)
出处 《计算机应用研究》 CSCD 北大核心 2018年第12期3652-3654,共3页 Application Research of Computers
基金 国家"863"计划资助项目(2015AA015603) 国家自然科学基金资助项目(61602114)
关键词 中文地址 分词 卡短时记忆 未标记数据集 Chinese address segmentation LSTM unlabeled set
  • 相关文献

参考文献8

二级参考文献81

共引文献421

同被引文献54

引证文献6

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部