摘要
从P2P系统自组织和动态性特点出发,提出分布式环境下隐语义索引(LSI)构建和更新的P2P网络模型,设计适合P2P系统文档矩阵的降维表示(RDR)合并算法,结合信号和噪声子空间模型从理论上分析RDR合并算法的有效性及算法需要满足的前提条件;使用M atlab6.5针对标准文集测试RDR合并算法对查询精度的影响.理论分析和数字实验证明,该算法能够解决P2P系统中分布式LSI的构建和更新问题,能在可容忍的查询精度影响范围内,以较低的网络开销和计算量分布式地构建、更新隐语义索引.
Taking P2P's (peer-to-peer) characters such as self-organizing, anonymous and dynamic into account, this paper proposes a model for building and updating distributed LSI ( latent semantic indexing) and an algorithm for merging reduced-dimension-representation (RDR)s which is suitable for P2P systems. Using the subspace model in signal and noise field, a theoretical justification for RDR-Merging and the precondition of the algorithm are provided. A test based on standard document set MED ( medlars collection) was conducted in Matlab 6.5 to explore the error brought by RDR- Merging algorithm. Theoretical analysis and numerical experiments both show that our building and updating algorithm for distributed LSI can reduce communication overhead and computation cost of SVD (singular value decomposition) effectively while keeping fair query precision.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第1期39-42,共4页
Journal of Southeast University:Natural Science Edition