摘要
针对传统分词对词典依赖过高的问题,该文提出了一种基于深度学习的中文地址要素的切分与重组算法。首先利用二元语法(Bigram)二分法将地址切分,然后用网络兴趣点(POI)数据地址集作为样本,采用基于深度学习的方法对地址要素进行特征匹配与要素重组,最终实现以地址要素为单元的中文地址自动切分。本文采用上万条网络采集的POI地址数据作为实验样本,实验结果表明,该算法不仅降低了对词典的依赖,同时也对地名地址的切分正确率有较大提升。
Aiming at the problem of excessive dependence on dictionary of traditional word segmentation,this paper proposed an algorithm of segmentation and reorganization of Chinese address elements based on depth learning.Firstly,the address was segmented by Bigram dichotomy,then the network point of interest(POI)data address set was taken as the sample,the feature matching and element reorganization of the address elements were carried out based on the depth learning method,finally,the Chinese address in the address features was automatically cut.Tens of thousands of POI data collected from the network were taken as the experimental samples in this paper,and the result of the experiment showed that the algorithm reduced reliance on dictionaries and improved the correct segmentation rate of the addresses as well.
作者
李一
刘纪平
罗安
LI gi;LIU Jiping;LUO An(Lanzhou Jiaotong University,Lanzhou 730070,China;Chinese Academy of Surveying and Mapping,Beijing 100830,China)
出处
《测绘科学》
CSCD
北大核心
2018年第10期107-111,共5页
Science of Surveying and Mapping
基金
中国测绘科学研究院基本科研业务费项目(7771605)