摘要
提出一种基于语义核函数的问题分类算法,该算法基于问题的语法结构构建支持向量机(SVM)核函数.首先,将给定的问题解析为语法树结构,用语法树的子树表示该问题;然后,从词法、语法、语义三个层面提取问题的特征,构成更加丰富的特征空间;接着,基于问题的语法树构建核函数;最后,使用潜在语义索引方法并结合问题的词法、语法以及语义特征,通过语义核函数将特征空间映射到更有效的空间中进行问题分类.TREC数据集上的实验结果表明,通过词法、语法以及语义增强的问题特征空间可以提高分类准确率.
A question classification algorithm based on semantic kernel function is proposed. This algorithm constructs Support Vector Machine(SVM) kernel function based on the grammatical structure of the question. Firstly,the given question is parsed into syntactical structural tree,and then sub-trees of syntactical tree are used to represent the question. Secondly,features are extracted from three aspects of the question: lexical,syntactical and semantic,to form a richer feature space. Thirdly,the kernel function is constructed based on syntactical structural tree of the question. Finally,using the potential semantic indexing method and the lexical,grammatical and semantic features of the question,the feature space is mapped into a more efficient space by the semantic kernel. The experimental results on the TREC dataset show that the classification accuracy can be improved by lexical,grammatical,and semantic enhancement.
出处
《上海师范大学学报(自然科学版)》
2018年第1期53-56,共4页
Journal of Shanghai Normal University(Natural Sciences)
基金
国家自然科学基金(61572326
61702333)
上海市教育科学规划项目(C160049)
上海市科委地方院校能力建设项目(17070502800)