期刊文献+

基于遗传算法及概率论的文本分类算法 被引量:2

Text Classification Algorithm Based on Genetic Algorithm and Probability Theory
下载PDF
导出
摘要 本文意在提高文本分类的准确度和速度。利用tf算法对特征项进行初步赋予权值,再使用屏蔽词对特殊非实意词进行屏蔽。本文独创概率论分布法,使用L-E算子进行加权,使得特殊位置与分布广泛的特征项,呈指数形式加权,较优结果能更快收敛。本文利用遗传算法,采用交叉算子和变异算子,采用适宜的目标函数,加快了检索速度,并有更大概率得到最优结果。采用混合算法,可以排除同义词和非特征项的干扰。 This article aims to improve the accuracy and speed of text classification. T * f algorithm is used to initially weigh the feature item, then stop words is used to shield specially meaningless words. Original probability distribution method and weighted L- E operator enable the features in the special positions or widely distributed to weight in exponential form, so that the better results converge faster. In this paper, by using the genetic algorithm, crossover operator and mutation operator, and adopting appropriate objective function, the retrieval process speeds up, and has a greater probability to get the optimal result. Hybrid algorithm is proposed, which can eliminate the synonyms and the characteristics of interference.
作者 宋倩 王东明
出处 《电脑与电信》 2015年第3期49-52,共4页 Computer & Telecommunication
基金 大夏基金项目 项目编号:2013DX-241
关键词 遗传算法 文本分类 特征项 genetic algorithm text classification term
  • 相关文献

参考文献3

  • 1Salton G, Buckley B. Term-Weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1988, 24(5): 513-523.
  • 2Fodor I K. A survey of dimension reduction techniques[R]. Tech- nical report UCRL-ID-148494, LLNL,2002.
  • 3Lewis D D. Features selection and feature extraction for text cate- gorization [J]. Pattern Anal Applic, 2003,6 : 301-308.

同被引文献27

引证文献2

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部