摘要
对中文这种意合型语言而言,为了进行文本内容理解和文本语义推理,必须识别文本间的蕴涵关系。针对中文文本,在文本预处理的基础上,提取中文文本的相关统计特征和词汇语义特征;基于获取的统计与词汇语义特征,使用支持向量机设计并实现分类器对中文文本对间蕴涵关系进行分类。实验结果表明,基于统计与词汇语义特征进行中文文本蕴涵关系识别是可行的。
In order to further analyze and understand the text, textual entailment recognition should be paid more attention to, especially the ones in Chinese text pair. The statistical and lexical semantic features, associated with Chinese text pair, are extracted after the Chinese text preprocessing, such as Chinese word segmentation and stop words removal. The textual entail- ment recognition is actually one classification task and the classification model based on support vector machine can he designed and implemented using the extracted statistical and lexical semantic features. The experimental results demonstrate the effectiveness and feasibility of the classification model using the textual statistical and semantic features.
出处
《计算机工程与设计》
CSCD
北大核心
2013年第5期1777-1782,共6页
Computer Engineering and Design
基金
国家自然科学基金项目(61100133
61173062)
国家社会科学基金重大项目(11&Z189)
湖北省教育厅人文社科基金重点项目(2011jyte126)
关键词
文本蕴涵
统计特征
词汇语义特征
支持向量机
矛盾
textual entailment
statistical feature
lexical semantic feature
support vector machine
contradiction