摘要
为了解决老挝机构名实体构词方法和语法规则复杂的问题,提出融合多特征的CRF与SVM的实体识别框架。面向老挝语机构名构词特点,将老挝机构名称分为前缀词和后缀词,将前缀词提取构造成一个机构名称特征词典,基于词典与SVM模型确定老挝机构名称前界,再使用融合多特征的CRF模型识别机构名称;最后使用SVM确定的前缀词修正CRF的识别结果。实验结果表明,精确率达到83.49%,召回率达到81.99%,证明了该方法的有效性。文中方法结合了SVM模型与CRF模型的优点,并融合了老挝机构名称的相关语言学特征,取得了较好的识别效果。
In order to solve the problem that the word-formation method and grammatical rules of Lao organization name entities are complex,an entity identification framework of CRF(conditional random field)and SVM(support vector machine)fusing multiple features is proposed.According to the word-formation characteristics of institution names in Lao language,the Lao institution names are divided into prefix words and suffix words.The prefix words are extracted to build a dictionary about institutional name features.The prezones of the Lao institution names are determined on the basis of the dictionary and SVM model.The CRF model fusing multiple features is used to identify the institution names.Finally,the prefix words determined by SVM are used to correct the recognition results of CRF.The experimental results show that the accuracy rate of the method reaches 83.49%and its recall rate reaches 81.99%,which prove the effectiveness of the method.In the proposed method,the advantages of the SVM model and CRF model are combined,and the relevant linguistic features of Lao institution names are integrated,which achieve good recognition results.
作者
晏雷
周兰江
张建安
周枫
YAN Lei;ZHOU Lanjiang;ZHANG Jian’an;ZHOU Feng(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650000,China)
出处
《现代电子技术》
北大核心
2020年第19期122-125,129,共5页
Modern Electronics Technique
基金
国家自然科学基金(61662040)
国家自然科学基金(61562049)
云南省自然科学基金面上项目(2016FB101)。
关键词
老挝语
机构名称识别
多特征融合
前缀词提取
识别结果修正
实验结果分析
Lao
organization name recognition
multi-feature fusion
prefix word extraction
recognized result correction
experiment result analysis