摘要
文本信息挖掘有利于提高文本信息的查找和利用效率,针对传统方法存在的问题,提出文本信息挖掘方法。首先提取文本信息术语,估计信息内容与文本类别间的余弦距离,结合模糊规则推理和余弦距离得到隶属度,然后根据均值密度的中心估计方法得到文本数据集合的平均密度,确定文本信息聚类中心,删除远离文本信息聚类中心的奇异数据点,实现大数据环境下文本信息挖掘。实验结果表明,该方法能够有效提高文本信息挖掘的查准率,而且具有较强的可扩展性。
The text information mining is helpful to improve the efficiency of text information retrieval and utilization. Aiming at the problems existing in the traditional methods,a text information mining method is proposed. The term of text information is extracted to estimate the information content and the cosine distance between test categories. The fuzzy rules reasoning and cosine distance are combined to obtain the membership. And then the central estimation method based on mean value density is used to get the average density of the text dataset,determine the clustering center of text information,delete the singularity data point far away from the clustering center of text information,and realize the text information mining in big data environment.The experimental results show this method can improve the precision ratio of text information mining effectively,and has strong scalability.
出处
《现代电子技术》
北大核心
2017年第23期123-126,共4页
Modern Electronics Technique
关键词
大数据
文本信息
信息挖掘
查准率
big data
text information
information mining
precision ratio