摘要
根据HTML文档不同标签域的分布特征和对文档内容的代表能力不同,我们提出了一种改进的向量模型(PFTF),并通过trec12的查询实验,比较了传统向量模型与PFTF模型对单个标签域以及多个文档表示结果的结合的检索性能.实验结果表明,PFTF模型对于这两个方面都有提高.
According to the text term distribution and content representing ability of different fields of HTML document we propose an improved Vector Space Model (PFTF Model) and compare the retrieval performance of the classical Vector Space Model with that of the PFTF Model by the experiments of a single field and multiple document representations combination using the queries of tree12. The results of these experiments show that the performance of PFIT Model is better than that of traditional model.
出处
《情报学报》
CSSCI
北大核心
2005年第4期433-437,共5页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金