摘要
介绍了一种信息抽取和自动分类的新应用,分析了传统分类方法的不足,介绍了一种基于隐含语义索引技术的文本分类改进方案。该技术是一新型的检索模型,它通过奇异值分解,或增强或消减词在文档中的语义影响力,使得文档之间的语义关系更为明晰,从而能容易地剔除掉那些语义关联弱的噪声数据,提高特征值提取精度和最后的分类准确度。
This paper presents a new implementation of information retrieval and automatic classification.In order to overcome the shortage of traditional methods,an improved classification based on latentsemantic indexing is introduced.LSI is a new retrieval model based on Singular Value Decomposition (SVD).Using the algorithm,every term will be either strengthened or weakened. When the latent semantic becomes clearer,it is easy to cut off most of the noisy data at the very beginning.So the accuracy of classification will be improved.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第14期192-194,共3页
Computer Engineering and Applications
基金
国家高技术研究发展计划(863)(No.2003AA118070)~~
关键词
隐含语义索引
奇异值分解
文本分类
信息抽取
latent semantic indexing
singular Implementation of supply and demand information classification based on latent semantic indexing value decomposition
text classification
information retrieval