摘要
分析识别文本蕴涵的主流方法,并基于文本T和假设H可以从潜在混合主题中生成的猜想,提出一个混合主题模型来识别文本蕴涵,描述一个在混合主题模型上生成文本的概率模型。该模型把文本T和假设H看成是同一语义的不同表达,表示为多模式的数据,若文本T和假设H有蕴涵关系,则它们有相似的主题分布,共享混合词汇表和主题。设计mix LDA和LDA模型的对比实验,并对RTE-8任务进行测试,通过支持向量机对得到的句子相似度和其他词法句法特征进行分类。实验结果表明,基于混合主题模型的文本蕴涵识别具有较高的准确率。
This paper analyses the main method of recognizing textual entailment,and proposes a method named mixed topic model to recognize textual entailment, and describes a probabilistic model based on the assumption. Texts are generated by mixtures of latent topics. It takes the T( Text) and H( Hypothesis) as a different expression of the same semantic mean. These can be represented as multi mode data. If text entails hypothesis,they have the similar probability distribution of the topic,shares the same mixed bag of words and topics. The model is used in the task RTE-8,parallel tests of mixLDA and LDA models are designed,and a system experiment uses the Support Vector Machine( SVM) to classify the features which consist of the textual similarity made by this model and other features. Experimental result demonstrates the high accuracy of the mixed topic model to recognize textual entailment.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第5期180-184,共5页
Computer Engineering
基金
国家自然科学基金资助面上项目"汉语文本推理的资源建设和统计分析研究"(61173062)
关键词
文本蕴涵
主题模型
多模式
混合主题
隐藏语义
支持向量机
textual entailment
topic model
multi mode
mixed topic
latent semantic
Support Vector Machine ( SVM )