期刊文献+

基于词库匹配的网络文本分类及预测

Web Text Classification and Prediction Based on Thesaurus Match
下载PDF
导出
摘要 为实现中文文本的准确分类,提出一种基于词库匹配的分类方法。在测试集中采用向量空间模型进行特征表达,用基于词逆向文档频率(TF-IDF)的主成分分析法计算权值,筛选并建立47个行业的索引词库,然后根据与索引词库的余弦相似度判断文本行业类别,建立自回归积分滑动平均(ARIMA)模型,预测其未来10天发展趋势。实验结果表明,索引词库的平均分类效果指标F值为85.6%,预测模型的平均相对误差为3.41%,证明该分类方法是有效的。 In order to achieve accurate classification of Chinese text,a classification method based on thesaurus match is put forward. The vector space model is used to express the features in the test set,the principle component analysis based on term frequency-inverse document frequency is used to weight the feature items in the corpus,47 industries index thesaurus are screened out and built. And then the text industry category is determined according to the cosine similarity,the auto-regressive integrated moving average model is established,and the development trend of the next 10 days is forecast. Experimental results show that,the average classification performance F of index thesaurus is 85. 6%,the average relative error of prediction model is 3. 41%,which proves the classification method to be effective.
出处 《计算机与现代化》 2017年第10期72-75,共4页 Computer and Modernization
关键词 文本分类 向量空间模型 主成分分析法 余弦相似度 自回归积分滑动平均模型 text classification vector space model principal components analysis cosine similarity ARIMA
  • 相关文献

参考文献7

二级参考文献101

共引文献1017

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部