摘要
提出一种基于词典特征优化和依存关系的时间表达式识别方法。首先针对中文文本时间表达式边界定位不准确及长距离依赖的问题,优化了传统时间词典特征,将时间词典分为时间词词典和时间单位词典;其次针对传统基于机器学习的时间表达式识别方法忽视时间表达式本身结构特点的问题,在优化后的词典特征的基础上提取依存特征,挖掘时间表达式的结构信息;最后综合时间表达式的基本特征、词典特征和依存特征,在条件随机场模型上完成时间表达式识别。在中文语料上进行实验,时间表达式识别达到较好效果。
This paper proposes a Chinese temporal expression recognition method based on optimization of dictionary features and dependency relation. First, since it' s hard to extract an exact match for temporal expression and recognize the long-distance-dependent temporal expressions representing time with many tokens in Chinese text, the traditional temporal dictionary features are optimized, and the temporal dictionary is divided into the temporal word dictionary and the temporal unit dietionary. Secondly, since traditional temporal expression recognition method based on machine learning ignores structural characteristics of temporal expression, dependent features are extracted on the basis of optimized dictionary features to mine structural information of temporal expression. Finally, by integrating basic features, dictionary features and dependent features, temporal expression recognition is completed based on conditional random fields. Experimental results show that the proposed method is beneficial to Chinese temporal expression recognition.
出处
《信息工程大学学报》
2016年第4期490-495,共6页
Journal of Information Engineering University
基金
国家社会科学基金资助项目(14BXW028)
关键词
时间表达式
时间表达式识别
时间词典
条件随机场
依存句法分析
temporal expression
temporal expression recognition
temporal dictionary
conditional random fields
dependency parsing