摘要
新闻事件类型识别的核心是文本分类问题,可利用模式识别或者机器学习来解决。互联网中的新闻事件种类多样,各类事件都有不同的结构特征,在自然语言中的表达方式也多种多样,基于模式识别的事件抽取难以覆盖全部的事件表达模式,识别召回率不高。本文使用机器学习方法来进行新闻事件的抽取,设计了词法、句法和语义三类不同类型的特征,并基于支持向量机实现新闻事件的类型识别。支持向量机模型适合解决自然语言这类高维数据的分类问题,能够有效捕捉不同特征之间的分类差异,具有较好的准确率和召回率。
The essential problem of the Internet news event type recognition is text classification,which can be solved by pattern recognition or machine learning.Due to the vast types of events and the different structures of the different events,it is impractical to cover all expression patterns for the event extraction based on pattern recognition with high recall rate.In this paper,we propose support vector machine with term,syntactic and semantic features,an efficient model for the classification of high dimensional data such as natural languages,to recognize the differentials of classes among different features with high accuracy and recall rate and more efficiency.
作者
李响
杨小琳
魏勇
董玮
胡涛
LI Xiang;YANG Xiaolin;WEI Yong;DONG Wei;HU Tao(Institute of Geographical Spatial Information,Information Engincering University,Zhengzhou 450001,China;Troops 31008,Beijing 10091,China)
出处
《地理信息世界》
2019年第2期73-78,共6页
Geomatics World
基金
地理信息工程国家重点实验室开放基金(SKLGIE2016-Z-4-2)资助