摘要
文本的特征表示是文本信息组织和检索的关键。经典的向量空间模型是最重要的一种技术,但它也有一些缺陷,如不能表示特征词在文本中的空间分布信息。为了提高文本描述的精度及检索能力,笔者给出了权重计算的修正方法。该方法在考虑特征项全局信息的基础上增加了其局部特征,它保留了传统VSM方法的特点,同时有效地将局部信息集成到一起,最后给出了一个具体的方案与相应算法。
Document feature representation is the key to document information organization and retrieval. The classical VSM technology is the most important one, however, it also has some defects. For instance, it can not contain the information about the spatial distribution of terms in a document. To describe a document more accurately and improve the retrieval capability, the author modifies the method of calculating the weight of terms. The method adds the information of local feature to the overall feature. It maintains the characters of the classical VSM technolo- gy, and meanwhile integrates the local features effectively. Finally, the author gives a specific scheme and its corresponding algorithm.
出处
《情报理论与实践》
CSSCI
北大核心
2009年第4期115-117,共3页
Information Studies:Theory & Application
基金
广东商学院博士启动项目和校级项目资助的研究成果之一
项目编号:06BS87001