摘要
简要描述了文本过滤的背景 ,提出了基于示例的中文文本过滤模型 .其基本思想是首先对于用户提出的示例文本进行文本结构分析 ,采用文本层次分析方法 ,提取文本特征 ,形成主题词表示的用户模板 ,然后进行文本过滤 .在用户反馈的基础上 ,扩充示例文本数量 ,进而采用基于潜在语义标注的文本过滤方法 ,改进用户模板 ,提高过滤效率 .
This paper briefly describes the background of text filtering and puts forward examplebased Chinese text filtering model. The basic ideas of the model are as follows: it analyzes the structure of the texts, applies the text hierarchical analysis approach presented in this paper to extract the from the texts, forms the user profiles consisting of the above and then filters the new text sources . Consequently, based on the user feedback, it expands the number of examplebased texts, applies the approach of latent semantic indexing to filter texts, and updates the user profiles to improve the efficiency of filtering model
出处
《大连理工大学学报》
CAS
CSCD
北大核心
2000年第3期375-378,共4页
Journal of Dalian University of Technology
关键词
广西结构
潜在语义索引
中文文本过滤模型
TREC
semantic information/text filtering
text structure analysis
latent semantic indexing