期刊文献+

基于深度学习的地址信息自动标注研究 被引量:2

Research of Address Information Automatic Annotation Based on Deep Learning
下载PDF
导出
摘要 文本序列的自动标注能够解决深度学习普遍面临的人工标注成本过高的问题.本文针对地址信息的实体表述特征,构建基于实体边界矩阵(Entity Boundary Matrix,EBM)的表示模型,在此基础上提出了一种基于深度学习和KNN标签修正算法(K-Nearest Neighbours Correction Algorithm,KNN-CA)的不需要任何人工标注训练集的自动标注算法.首先获取预置小区数据集并构建离线特征库和初始化在线特征库;接着通过匹配算法求解EBM并利用KNN-CA进行优化,再通过数据增广得到自动标注的训练集;然后训练BiLSTM-CRF深度学习模型并预测所有未曾标注的地址信息的序列标注;最后再次利用KNN-CA优化可求解EBM的序列标注,由此构建适用于中文地理命名实体(Chinese Geospatial Named Entities,CGSNE)识别及相关研究的序列标注语料库.实验表明,标注数据的F1值达到了95.35%. Automatic annotation of text sequence can address the common issue of high manual annotation labor cost in deep learning.In this paper,a representation model based on the entity boundary matrix(EBM)is constructed.On the basis,we propose an automatic annotation algorithm combining deep learning with KNN annotation correction algorithm(KNN-CA)where the manual labeling training set is not required.Firstly,the offline feature library and online feature library is built and initialized respectively with the utilization of collecting estate dataset.In addition,EBM is solved by matching algorithm and optimized via KNN-CA technique.After the data augmentation process,a training dataset of automatic annotation is obtained.Then the BiLSTM-CRF deep learning model is trained and all unlabeled annotation sequence is predicted.Eventually,the annotation sequence of solvable EBM is optimized via KNN-CA again so as to construct a sequence annotatied corpus dataset which is suitable for the identification of Chinese Geospatial Named Entities(CGSNE)and related researches.The experiment demonstrates that F1 score of labeled data reaches 95.35%.
作者 凌广明 徐爱萍 王伟 LING Guang-ming;XU Ai-ping;WANG Wei(School of Computer Science,Wuhan University,Wuhan,Hubei 430072,China;State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing (LIESMARS),Wuhan University,Wuhan,Hubei 430079,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2020年第11期2081-2091,共11页 Acta Electronica Sinica
基金 国家重点研发计划资助(No.2017YFC0803700)。
关键词 深度学习 自动标注 地址信息 K近邻 语料库 deep learning automatic annotation address information KNN(K-Nearest Neighbours) corpus
  • 相关文献

参考文献12

二级参考文献173

共引文献223

同被引文献36

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部