摘要
在海量数据下对文本信息进行准确检索,能够帮助人们获取新知识,提高工作效率。传统的检索方法不能对海量数据下文本信息特征的变化造成的影响做出反应,从而降低了文本信息检索的准确性。提出一种基于特征聚类的文本信息检索方法。对文本信息进行降维处理,保留主要的文本信息特征,消除冗余数据带来的影响;在文本信息检索的过程中,按照特征相似度对文本信息的特征进行聚类,确定文本信息检索的目标函数,并利用约束条件进行约束,在检索的过程中自适应调整文本信息特征的聚类中心和特征的权值,最终实现了文本信息的准确检索。仿真结果表明,改进算法能够提高海量数据下文本信息检索的准确率和效率。
The paper put forward a text information retrieval method based on the characteristics of clustering. Dimensionality reduction was carried out for text information processing, the main characteristics of text information was retained, and the effects of redundant data were eliminated. In the process of text information retrieval, and according to the characteristic similarity clustering characteristics of text information, the objective function of the text information retrieval was determined, and the constraint conditions of constraints were used in the process of retrieving the adaptive adjustment of text information features of clustering centers and weights. Finally the accurate retrieval of text information was realized. Simulation experimental results show that the improved algorithm can improve the mass data of text information retrieval accuracy and efficiency.
出处
《计算机仿真》
CSCD
北大核心
2016年第4期429-432,共4页
Computer Simulation
基金
中国博士后科学基金项目资助(2013M541005)
关键词
海量数据
文本信息
检索
Huge amounts of data
Text information
Retrieval