摘要
研究基于关联度挖掘的海量网络文本挖掘方法;随着计算机和网络技术的快速发展,网络上的文本呈现海量增长的趋势,传统的网络文本挖掘方法采用基于特征提取的方法实现,能够实现小数据量下的文本挖掘,但是在信息量的快速增长下,传统方法已经不能适应;提出一种基于关联度挖掘的海量网络文本挖掘方法,首先采用特征提取的方法对海量文本进行初步的分类和特征识别,然后采用关联度挖掘的方法对各个文本特征之间的关联度进行计算处理,根据关联度的大小最终实现文本挖掘,由于关联度可以很好的体现特征文本之间的相互关系;最后采用一组随机的网络热门词汇进行测试实验,结果显示,算法能够很好适应海量文本下的挖掘实现,具有很好的应用价值.
The text mining method of massive network based on correlation mining was research on. With the rapid development of computer and network technology, the text rendering of network grew fast, the traditional network --based text mining method extracted feature from text to achieve text mining, but with the rapid growth in the amount of information, the traditional methods cannot meet the need of development. So a text mining method of massive network based on correlation mining was proposed, the feature was extracted with mass text to finish initial classification and characteristics identification, and then the method of mining correlation between the characteristics of the various texts correlation was used to do calculate the coefficient of correlation, according to the coefficient of correlation, the text was divided into several types, so the correlation can reflect the relationship between the characteristics of the text well. Finally, a team of random words were used to test the ability of the algorithm, and the result shows that the algorithm can adapt to massive text excavation well with good application value.
出处
《微电子学与计算机》
CSCD
北大核心
2013年第10期157-160,164,共5页
Microelectronics & Computer
关键词
关联度挖掘
海量文本
特征提取
correlation mining
massive text
feature extraction