摘要
为解决社区问答系统中的问题短文本特征词少、描述信息弱的问题,本文利用维基百科进行特征扩展以辅助中文问题短文本分类。首先通过维基百科概念及链接等信息进行词语相关概念集合抽取,并综合利用链接结构和类别体系信息进行概念间相关度计算。然后以相关概念集合为基础进行特征扩展以补充文本特征语义信息。实验结果表明,本文提出的基于特征扩展的短文本分类算法能有效提高问题短文本分类效果。
In order to resolve the problems of the lack words and describe weak signals of question in community- based Q&A system, this paper proposed a method of feature extension which based on Wikipedia to help Chinese question classification. First, the set of related concept were extracted from Wikipedia and concept associativity was calculated based on the concept pages, links and so on, and concept asseciativity was calculated based on the combination of link structure and category system. Second, the semantic information of text features was supplied by feature extension according to the set of related concept. Results of experiments showed that the short text classisification algorithm which was proposed by this paper can get better classified effect.
出处
《现代情报》
CSSCI
2013年第10期70-74,共5页
Journal of Modern Information
关键词
社区问答
维基百科
特征扩展
短文本分类
community- based Q&A
Wikipedia
feature extension
shor text classification