摘要
针对旅游领域,提出了一种基于层叠条件随机场模型的旅游领域命名实体识别方法。该方法在低层条件随机场中以字为切分粒度,结合旅游景点常用字表、景点常用后缀表、地名常用字表等特征词典,实现简单旅游命名实体的识别;其识别结果传递到高层模型,以词为切分粒度,结合复杂特征,实现嵌套景点、特产风味、地点的识别。最后进行了两组相关实验,结果表明,在开放测试中,层叠条件随机场模型相比于单层模型,F值提高了8个百分点;相比于HMM模型,正确率提高了8个百分点,召回率提高了22个百分点,F值提高了15个百分点。
This paper presents a method for named entity recognition in the tourism domain based on the cascaded conditional random fields. This method consists of two steps. The first step is used to identify simple tourism named entities, using Chinese characters as units with the dictionary of common character and suffix in tourism attractions, the dictionary of common character in location names and other dictionaries. Then the results of the first step are sent to the second step, in which the nesting tourist attractions, special snacks and location names are recognized by the word unit and other complex features. The results of six experiments indicated that in open testing, the proposed method increases by 8% in the F-score compared to the model of single layer, and by 15% in the F-score (with 8% in the precision and 22% in recall, respectively) compared to the HMM model.
出处
《中文信息学报》
CSCD
北大核心
2009年第5期47-52,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60863011
60663004)
教育部博士点基金资助项目(20050007023)
云南省中青年学术带头人后备人才基金资助项目(2007PY01-11)
云南省教育厅重点基金资助项目(07Z11139)
昆明理工大学博士基金资助项目(2006-12)
关键词
计算机应用
中文信息处理
旅游领域
命名实体识别
层叠条件随机场
特征模板
computer application
Chinese information processing
tourism domain
named entity recognition
cascaded conditional random fields
feature template