摘要
为了利用依存关系进行短文本分类,研究了利用依存关系进行短文本分类存在的四个关键问题。分别在长文本语料集和两个短文本语料集上,抽取具有依存关系的词对,并利用这些词对作为特征进行分类实验。实验结果表明:依存关系能够作为有效的特征进行文本分类,并能够改善文本分类的性能;单独把依存关系作为特征,不能提高短文本的分类性能;可以利用依存关系作为特征扩充的手段,增加短文本的特征,增强短文本的描述能力,进而进行有效的短文本分类。
Four key issues of classifying Chinese short text using dependency relation are discussed to use dependency relation to classify Chinese short text effectively.This paper extracts the dependency relations between two words in a long-text corpus and two short-text corpuses,and uses these word-pairs to classify texts in order to analyze the role of dependency relation in short text classification.Experiments show that Using dependency relation to classify texts can improve the classification performance;Using dependency relation to classify short texts lonely can not improve the classification performance;dependency relation as means to expand features can increase features and enhance description ability of short text in order to classify short texts effectively.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第3期131-133,141,共4页
Computer Engineering and Applications
基金
国家自然科学基金No.60703010
重庆市自然科学基金 No.2006BB2374
重庆市教委科学技术研究项目(No.KJ070519)
教育部回国留学人员启动基金(教外司留[2007]1109号~~
关键词
依存关系
短文本
文本分类
dependency relation
short text
text classification