摘要
由于网络聊天文本具有结构松散、简短、上下文相关等特点,对其进行特征选取时使用传统的TFIDF(Term Frequency Inverse Document Frequency)算法存在较大缺陷。针对这个问题,本文提出了一种通过聊天主题来确定聊天文本的特征选取范围的方法,并通过实验验证了该方法的有效性。
Because online chat text is loosely and briefly organized and is context dependent, there are some defects to select features by using traditional TFIDF (Term Frequency Inverse Document Frequency). Aiming at solving the problem, this paper presents a method that the range of feature selection in chat text is obtained by using chat subjects, and the validity of the method is verified through experiment.
出处
《计算机科学》
CSCD
北大核心
2007年第5期202-204,共3页
Computer Science