摘要
由于中文地址命名的不规范性和汉语语言特点,中文地址要素识别成为地址编码的关键技术。传统的特征字匹配和字典匹配方法,难以解决地址要素命名的多样性问题。借鉴自然语言处理技术,通过构建地址要素标注集,设计了基于条件随机场的中文地址要素识别方法。实验证明,与基于特征字的规则方法相比,基于条件随机场的方法能够在较大程度上提高识别效果。由于条件随机场模型具有较好的泛化能力,该方法具有更强的通用性,特别适宜于大规模地址数据的批量解析和大众化位置服务中地址编码的快速处理。
Because of the nonstandard named Chinese address and description character of Chinese language,recognition of Chinese address elements has been regarded as key issues of Chinese geocoding.It is difficult to resolve the problem of address name diversity by traditional method of character words matching and dictionary or gazetteer matching.Chinese address recognition method on the basis of CRFs is designed by constructing address annotation set using NLP technology.The experiment proves that CRFs based method is better than character based rule method in recognition result.As CRFs model has good generalization ability,this method has greater generality that especially fits for large-scale batch parsing and quick geocoding in LBS.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第13期129-131,共3页
Computer Engineering and Applications
基金
国家自然科学基金No.40971231~~
关键词
地址编码
中文地址要素
自然语言处理
条件随机场
geocoding
Chinese address element
natural language processing
conditional random fields