期刊文献+

C-TERN:一种基于CFSA的军事新闻文本时间信息处理算法 被引量:4

C-TERN: A Temporal Information Processing Algorithm of Chinese Military News Story Based on Cascade Finite State Automata
下载PDF
导出
摘要 提出一种基于层叠有限状态自动机(CFSA)的中文军事文本时间表达式识别与规范化算法C-TERN。C-TERN首先利用成熟的分词工具识别出文本中的时间词,然后将从通用语言和军事语言中提取的时间表达式规则分成多层,逐层进行时间信息的精细识别。在规范化过程中,通过4个步骤分别对特殊时间表达式、简单时间表达式、时间段表达式和绝对/相对时间表达式进行推理计算和规范化。算法考虑了规则集提取的正确性、规则之间冲突的消解以及匹配方式的合理性。在多个数据集上的实验结果显示,C-TERN不但能有效地识别标准时间、偏移时间和不确定性时间表达式,而且能完成对简单、特殊以及隐含的时间点、时间段和偏移时间的推理与规范化,能够满足军事文本时间信息处理的需要。 The authors propose a new method C-TERN to recognize and normalize the temporal expression in military story based on cascade finite state automata. Firstly, C-TERN recognizes the temporal expression in military story, and layers the temporal information extracted from general language and military language, and recognizes the temporal by layer. Then, in the procedure of temporal expression normalization, C-TERN ratiocinates and normalizes the simple/specify time, duration time, absolute and relative temporal expression in four steps. The method pays special attention to the correctness of the regulation extraction, the dispelling of the collision between regulations, and the reasonability of the matching method. The experimental results on multi-information show that proposed method can recognize and normalize the absolute and relative temporal expression as well as the simple/specify time and duration time effectively. It can better meets the temporal information processing needs in military applications.
出处 《北京大学学报(自然科学版)》 EI CAS CSCD 北大核心 2014年第1期9-16,共8页 Acta Scientiarum Naturalium Universitatis Pekinensis
基金 陕西省自然科学基金(2013JQ8031) 国家自然科学基金(2012AA011101) 武警工程大学军事基础研究基金(WJY201314)资助
关键词 自然语言理解 有限状态自动机 时间表达式 识别与规范化 natrual language processing finite state automata temporal expression recognition and normalization
  • 相关文献

参考文献16

二级参考文献90

  • 1王昀,苑春法.基于转换的时间-事件关系映射[J].中文信息学报,2004,18(4):23-30. 被引量:19
  • 2陈振宇,陈振宁.怎样计算现代汉语句子的时间信息[J].中文信息学报,2005,19(3):94-104. 被引量:6
  • 3H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 4Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 5S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 6J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 7Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286
  • 8Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62
  • 9Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143
  • 10J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998

共引文献253

同被引文献27

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部