摘要
通过对普通文件(Plain Text)、HTML文件和XML文件结构的分析,以经典的VSM为例,探讨了3种文件在信息检索过程中所采用的不同处理技术。同时针对传统VSM的不足以及HTML文件和XML文件的结构特点,讨论了N-Level VSM对经典VSM的改进。
Through analyzing the file structure of plain text, HTML and XML, this paper probes into the different technologies of the three kinds of files used in the information retrieval process taking the classical VSM, and discusses the improvement of N-Level VSM to the classical VSM based on the shortages of traditional VSM and the structural features of HTML and XML.
出处
《科技情报开发与经济》
2009年第11期90-92,共3页
Sci-Tech Information Development & Economy