摘要
提出一种英文文本检索算法,从文本中提取关键词项,根据转移概率计算出关键词项的状态矩阵,并通过奇异值分解,提取第一奇异值向量作为复特征向量,利用向量间的余弦相似度作为文本检索的相似度度量。实验结果表明,该算法在检索准确率和运算效率上都优于传统的LSA算法。
A new retrieval algorithm for English texts is proposed. Keywords are extracted from the English texts. The state matrix of keywo(ds is calculated based on transition probabilities matrix and the first singular value vector is got through Singular Value Decomposition(SVD) as the complex feature vectors. The cosine similarity of texts is used to~ measure the similarity between the query and documents. Experimental results indicate that this algorithm gets the advantage over the traditional LSA algorithm in precision and computational efficiency.
出处
《计算机工程》
CAS
CSCD
北大核心
2011年第1期78-80,共3页
Computer Engineering
基金
四川省教育厅基金资助项目“基于混沌系统的线性调频信号检测与参数估计”(09ZB026)
关键词
文本检索
转移概率
奇异值分解
状态矩阵
texts retrieval
transition probability
Singular Value Deeompositinn(SVD)
state matrix