摘要
为了在非规范中文地址中有效的提取行政区划信息,提出了一种基于条件随机场的方法.该方法根据中文地址中行政区划的表达特点和特征,采用判别式概率模型,在观测序列已知的基础上对目标序列建模,通过构建语料训练集和建立相应的特征模板,得到行政区划的表达模型,然后使用该模型对测试集进行测试,并与标注好的测试数据进行比对,验证模型的性能.实验表明,与最大熵模型相比,条件随机场模型总的性能指标在其之上,地址信息解析的准确率能达到89.93%.
To extract the information of administrative division effective1y from the non-standard Chinese ad-dress, a method based on conditiona1 random fie1ds was proposed. According to the characteristics of admin-istrative division, the mode1 of the target sequence was constructed on the basis of the observation sequence by using the discriminative probabi1ity mode1. Then, the expression mode1 of the administrative division was obtained by constructing the corpus training set and the corresponding feature temp1ate. Fina11y, the perfor-mance of the mode1 was verified by testing the test set and comparing its resu1ts with the marked test data. Experimenta1 resu1ts show that the performance of the mode1 is better than that of the maximum entropy mode1, and the accuracy rate of ana1ysis of address information reaches 89.93%.
出处
《武汉工程大学学报》
CAS
2015年第11期47-51,共5页
Journal of Wuhan Institute of Technology
基金
国家863项目(2013AA12A202)
武汉工程大学研究生教育创新基金项目(CX2014090)
关键词
位置信息解析
条件随机场
训练语料
location information parsing,condition random fields,training corpus