摘要
地址匹配是地理编码的核心基础,本文针对现有地址匹配算法与地址数据库反馈交互局限性问题,提出词汇-结构-语义三层解构地址的匹配处理策略。词汇层通过地名词典和结合尾字特征的正则表达式定义粒度剖分规则,以地址词元素为基本单元完成词汇级别解析;结构层定义地址模式类型以实例化数据组织,完成顾及上下地址层级结构的模型匹配;语义层抽象地址语义形式化表达,实现融合深度语义的地址匹配。同时,本文在综合地址词元素筛选、地址层级结构剖分和地址语义理解基础上对经过完全解析的地址数据不断反哺作为数据参考,从而实现数据库支持下的算法逻辑绑定与结果集成。本文以浙江省湖州市德清县地址数据作为实例进行验证,实验结果表明,在低重复率的多次采样实验下,平均匹配率达到92.83%,正确率为95.37%;通过实例分析表明,本文方法在完善地址参考库的基础上改进算法性能和精度,能有效解决地址结构缺失和语义近似推断,适应多样地址类型。
Address matching refers to the process of matching the description address with the address in the standard address library,which is the core foundation of geocoding.It can convert the location description information into spatial coordinates,so as to build the association between texts and coordinates.Usually,Chinese address data has the problems of ambiguous expression,low standardization,and poor overall data quality.The current situation of Chinese data have greatly increased the construction cost of address reference library,which puts forward higher requirements for address matching algorithms and prompts the exploration of adopting integrated address matching strategies in practice.According to the fact that there has limited interaction between the existing address matching algorithms and address database feedback,this paper presents an integrated processing strategy for address matching.It describes a progressive logical matching strategy from vocabulary,structure,and semantics levels,which can support data organization while realizing deep text parsing.The vocabulary level parses the address structure to achieve word segmentation and text filtering from the character perspective;The structure level defines data organization of the address model and completes the quick indexing under hierarchical structure;The semantic level is the formal expression of address semantics,integrating semantic understanding and information extraction methods.Besides,on the basis of comprehensive address element filtering,hierarchical structure subdivision,and semantic understanding,we continuously feed back the fully parsed address data as reference to achieve the algorithm logic binding and results integration supported by the database.Thus,the efficiency of engine construction and the quality of algorithm are effectively improved.In order to verify our proposed strategy,we select the address data of Deqing County,Huzhou City,Zhejiang Province to carry out a comparison experiment.The results show that our strategy achieves stable and satisfied results indicated by matching rate,accuracy,and time indicators.Compared with the classical address matching algorithms,our strategy has obvious advantages in increasing the accuracy and saving time.The average matching rate is 92.83%,and the accuracy rate is 95.37%,under the low repetition rate multiple sampling experiment.Our results indicate that the proposed strategy can effectively solve the matching problems such as address element missing and approximate semantic calculation and improve the matching degree,matching rate,and matching efficiency.For addressing text elements that may indicate multiple spatial meanings,it is necessary to further combine spatial topology analysis to optimize the accuracy of address element recognition.
作者
亢孟军
何欣阳
刘诚
王明军
高宇灵
KANG Mengjun;HE Xinyang;LIU Cheng;WANG Mingjun;GAO Yuling(School of Resource and Environmental Sciences,Wuhan University,Wuhan 430079,China;Beijing Key Laboratory of Urban Spatial Information Engineering,Beijing 100038,China)
出处
《地球信息科学学报》
EI
CSCD
北大核心
2023年第7期1378-1385,共8页
Journal of Geo-information Science
基金
自然资源部城市国土资源监测与仿真重点实验室开放基金资助课题(KF-2019-04-064)
国家重点研发计划项目(2022YFC3005700)。
关键词
地址语言模型
地址匹配
集成策略
地址参考库
地址匹配度标准化
address language model
address matching
integration strategy
address reference library
address matching standardization