摘要
传统的基于支持向量机的文本过滤,用向量空间模型来表示文本和用户模板,向量空间模型假设特征项之间是线性无关的,该假设引入了许多因具体用词变化不定而带来的词汇噪音信息,影响了基于支持向量机的文本过滤的过滤性能。提出基于语义空间的支持向量机的文本过滤,用语义来表示文本和用户模板。该方法主要通过奇异值分解提取文本的潜在语义空间,在语义空间上训练支持向量机得到用户模板和过滤阈值,文本流上的文本映射到语义空间上,在语义空间上计算用户模板和新文本的相似度。实验表明:该方法的过滤性能可以达到 98. 67%。
Traditionally, text filtering based on support vector machine uses the vector space model to represent the text and user profile. Vector space model draws the noise into the system because it assumes that the word in the text is independent and it influences the performance of the filtering. The proposed method was based on vector support machine of semantic space in which text and user profile were represented by the semantic space. The proposed approach used the singular-value decomposition to derive a latent semantic space. User profile and filtering threshold could been got by training the support vector machine in the semantic space. And the similarity between the user profile and new text was computed by cosine measure, after the new text was mapped into the semantic space. Experimental results show that the filtering rate of our approach can get 98.67%.
出处
《计算机应用》
CSCD
北大核心
2005年第3期664-665,共2页
journal of Computer Applications
基金
福建省科技计划重点资助项目(001J005)
关键词
文本过滤
奇异值分解
支持向量机
语义空间
text filtering
singular value decomposition
support vector machine
semantic space